Guides

Triaging Findings

Last updated April 2026

A finished scan is the start of the work, not the end. A directory of a few thousand files easily produces tens of thousands of matches, and even a high-fidelity scanner like PII Crawler will surface some patterns that look like PII but aren't — sample data in a SQL fixture, a developer's test SSN like 123-45-6789, an email address in a vendor's license file. Triage is the process of separating real findings from noise so the remaining list is something you can act on: hand to legal, attach to a DSAR response, or route to a remediation queue.

This guide covers the triage workflow in both the TUI and the Web UI, the false-positive model (which is more flexible than a simple "ignore" flag), and how to export a clean report at the end.

The verdict model

Every match in a scan has a verdict: Unreviewed by default, and either True Positive or False Positive once you decide. Verdicts are persisted to the scan database, so you can pause and resume triage, and reruns of the same scan don't lose your work.

You can apply a verdict at three different scopes, in order of specificity:

  • Match — only this exact finding in this exact file
  • Text — every finding with the same value (e.g. every occurrence of 123-45-6789) within the scan
  • File — every finding in this file

A match-scope verdict overrides text- and file-scope verdicts at the same location. This lets you make a broad call ("this whole vendor manifest is FP") and still confirm one match inside it as a real positive without un-marking the file.

Why three scopes

Most "ignore" features only let you hide a single finding at a time. That is unworkable when one developer test SSN appears 400 times across a repo, or when the same backup folder is scanned weekly. Text-scope says "this exact value is never PII in this scan, anywhere"; file-scope says "this file is noise, drop everything in it." Match-scope is the precise scalpel for the cases where neither broad rule fits.

Path A: Triage in the TUI

The TUI is built for fast keyboard-driven triage. From the scan list, press Enter to open a scan, then f to open Findings.

TUI Findings view filtered to SSN, with several findings already marked as false positives shown alongside live ones

Review mode

The headline feature is Review mode (r). It pre-filters out anything you've already decided and walks you through the rest one group at a time, accepting single-key verdicts:

Key Verdict
f False positive (auto-decides match vs. text scope, see below)
t True positive
F File false positive (every match in this file)
T File true positive
s Skip (no verdict, move on)
. Repeat the last verdict
u Undo the last verdict
r Exit review mode

TUI in Review mode showing the verdict legend at the bottom: f=FP, t=TP, F/T=file, s=skip, .=repeat, u=undo, r=exit

Review mode picks the scope for f and t automatically. If the selected term only appears in one file in the current view, it applies a match-scope verdict. If the same (pii_type, term) appears across multiple files, it applies a text-scope verdict — one keypress triages every occurrence of that value at once. This is the right default for the most common case (a recurring placeholder or test value spread across a tree).

u is the safety net. If you mash f on something that turned out to be real, undo restores the prior verdict and puts the group back into the review queue.

When you're not in review mode

You can also triage without entering review mode:

  • f — mark the selected match as a false positive
  • F — mark the selected match's text (every occurrence) as a false positive
  • Ctrl+f — mark the selected match's whole file as a false positive
  • h — toggle whether false positives are hidden or shown in the list

h is what you use to audit your own decisions: flip it on to see everything you've marked FP in case you need to revert one.

Filtering before you triage

The Findings view supports filters (/) by PII type, by match text, and by extension. A common pattern: filter to ssn only, then enter review mode. That way you triage one PII type at a time with full attention, instead of context-switching between credit cards and email addresses every keystroke.

Masking for screen sharing

If you're triaging on a call or recording a walkthrough, press m to mask findings on screen. SSNs collapse to ***-**-1234, emails to jo*****e@ex***le.com, and so on. The masking is display-only — the underlying data is unchanged.

Path B: Triage in the Web UI

If you'd rather work in a browser, run piicrawler serve and open http://localhost:3001. The Web UI exposes text- and file-scope verdicts (match-scope is TUI-only):

  • FP All Files on each match group — text-scope verdict, hides every occurrence of that exact value across the scan
  • Mark Entire File as False Positive on the file detail page — file-scope verdict, hides every match in the file

Web UI match row with FP All Files button and a highlighted SSN in the context preview

The Web UI is the right tool when you want to read the surrounding context of a match before deciding — each match group expands to show the line of text it was extracted from, with the matched value highlighted. Use the TUI's Review mode when you've already decided the heuristic and just want to grind through the queue.

The full three-scope dropdown (This finding / All "[term]" / [Name] + [type]) is exposed on PII Identity Scan results — see PII Identity Scan. Regular scan findings in the Web UI use the two scopes above.

Exporting a clean report

Once you've worked through the list, you'll usually want a CSV that contains only the true positives.

From the TUI: in the scan detail view, press e to export. The export honours your show / hide false positives toggle (h). Hide false positives first (the default), then export — the resulting CSV will have FP-marked findings filtered out.

From the Web UI: the CSV download includes all findings regardless of FP state. Filter in your spreadsheet, or use the TUI export when you need a pre-filtered file.

From the database: every scan's data lives in the shared SQLite file at ~/.piicrawler/piicrawler.db (filter by scan_id to scope a query to a single scan). Two queries that come up a lot:

Count remaining true positives by PII type, after triage:

SELECT m.pii_type, COUNT(*) AS n
FROM scan_matches m
WHERE m.scan_id = ?
  AND NOT EXISTS (
    SELECT 1 FROM scan_false_positives fp
    WHERE fp.scan_id = m.scan_id
      AND fp.verdict = 'false_positive'
      AND (
        (fp.scope = 'match' AND fp.match_id = m.id)
        OR (fp.scope = 'text' AND fp.pii_type = m.pii_type AND fp.term = m.term)
        OR (fp.scope = 'file' AND fp.file_id = m.file_id)
      )
  )
GROUP BY m.pii_type
ORDER BY n DESC;

Files that are entirely false positives (nothing left to act on):

SELECT f.path
FROM scan_files f
JOIN scan_false_positives fp
  ON fp.scan_id = f.scan_id
 AND fp.scope = 'file'
 AND fp.file_id = f.id
 AND fp.verdict = 'false_positive'
WHERE fp.scan_id = ?;

See Results Storage for the full schema.

A workflow that scales

For large scans (10,000+ matches), order matters. Triaging top-down by file is much faster than triaging by match, because most noise concentrates in a few directories.

  1. Start at file scope. Open the Files view (l from scan detail) and look for the obvious noise: node_modules, vendor SDKs, test fixtures, .git history. Mark them FP at file scope (F). This usually removes 60–80% of findings in one pass.
  2. Then go to text scope. Sort findings by frequency. The top of the list is almost always recurring placeholder values (123-45-6789, [email protected], lorem-ipsum addresses). Mark those FP at text scope (F in the Findings list, or f inside review mode when it auto-decides text scope).
  3. Finally, review what's left in review mode. What remains should be the genuinely interesting findings — small enough to walk through one keypress at a time.
  4. Export and hand off. Hide false positives (h), export CSV (e), and you have a clean list to attach to a ticket, a DSAR response, or a compliance report.

Re-running scans

When the scan is re-run (a watched directory updates, or you re-trigger the scan manually), text- and file-scope FP rules are reapplied automatically — a new occurrence of an already-marked text value comes back already classified. Match-scope verdicts only apply to the specific match they were created for, so a new instance of "the same finding" in a re-extracted file will appear as Unreviewed again. This is intentional: the cheap rules generalise, the precise rules don't.

See also

  • PII Identity Scan — has its own per-run false-positive workflow scoped to identity associations
  • Results Storage — schema for the false-positive tables, useful for custom queries
  • CLI Referencepiicrawler dsar for cross-scan searches over the same triaged data
Was this page helpful?