Guides

Agentic Triage

Last updated May 2026

A finished scan typically produces far more matches than you have time to read. Most are real, some are noise (test fixtures, sample data, vendor licence files), and the only way to a clean report is to label every one of them. The TUI's review mode is fast for an experienced user, but it still costs human attention per finding.

Agentic triage moves that loop to a language model. PII Crawler ships a JSON-first CLI under piicrawler findings that is designed to be driven by an LLM agent: pull unreviewed matches as JSON, classify each one, and write the verdicts back in a single atomic batch. The verdicts land in the same scan_false_positives table the TUI and web UI use, so you can hand off mid-flight without losing work.

This guide walks the full loop: the CLI surface, a worked example, and the practical tips for getting good signal out of an LLM.

Keep the data on infrastructure you trust

The whole point of triage is to look at sensitive matches: SSNs, payroll records, customer email addresses, the surrounding sentences they appear in. Sending that context to a third-party LLM endpoint means handing real PII to whoever runs the endpoint. Treat the model as part of your data perimeter, not an external service:

  • Local models (Ollama, llama.cpp, vLLM, LM Studio) keep every byte on the machine running the scan. This is the default recommendation, especially for regulated data (HIPAA, GDPR, CCPA, internal compliance regimes).
  • Private cloud endpoints (AWS Bedrock, Azure OpenAI, Google Vertex AI, Anthropic via your own enterprise tenancy) keep the data inside an account you control, under a written agreement that prohibits training on your inputs. Confirm "no training" and data-residency terms in writing before pointing the agent at production findings.
  • Public consumer APIs are not appropriate for raw scan context. If you can't avoid one, pass --redact so SSN and DOB digits never leave the binary in plaintext (see Redacting sensitive terms below). Redaction is a backstop, not a replacement for keeping the data inside your perimeter.

The Python example below uses the public Anthropic API for brevity. For a real triage pass, swap the Anthropic() client for a Bedrock, Vertex, or local-runtime client of your choosing — the JSON contract on either side of the model is identical.

When to use an agent vs. the TUI

An LLM is a good fit when the call is largely about context: "is this 123-45-6789 in a unit test, a HR spreadsheet, or a customer letter?" The model reads the surrounding text and decides. It is a poor fit when the call is about company knowledge ("our test SSN is 987-65-4321, anything else is real") — encode those rules at text scope yourself first, then let the agent grind through the long tail.

A pragmatic split:

  1. You mark obvious noise files at file scope (F in the TUI) — vendor SDKs, fixtures, node_modules.
  2. You mark known placeholder values at text scope — 123-45-6789, [email protected].
  3. The agent triages whatever is left, one match at a time, with the surrounding context as input.
  4. You spot-check the model's verdicts and export the clean CSV.

The CLI is symmetric: findings unmark reverses anything the agent gets wrong, and findings stats lets the agent (and you) check progress without re-reading every match.

The CLI surface

Four commands cover the whole loop. Full flag reference lives in piicrawler findings; the shapes below are the parts an agent needs.

findings list — read

piicrawler findings list --scan 42 --json --limit 50

Emits one JSON object per match. Defaults to --verdict unreviewed, which is the natural starting point for a triage pass. --context surrounding (the default) returns a small snippet of text around the match — enough for an LLM to decide, without burning tokens on full pages.

{
  "id": 101,
  "scan_id": 42,
  "file_id": 7,
  "file_path": "/data/hr/2026-q1.csv",
  "pii_type": "ssn",
  "term": "123-45-6789",
  "start": 1024,
  "end": 1035,
  "context": { "surrounding": "...Employee SSN: 123-45-6789 filed on 2026-02-04..." },
  "verdict": "unreviewed"
}

Use --limit and --offset to page through large scans. --pii-type ssn --pii-type email narrows the batch to one or two types so you can specialise the prompt.

findings mark — write

Three selectors mirror the TUI scopes:

Selector Scope Use when
--match <id> one finding The model has read this exact context
--text <pii_type> <term> every match of that value across the scan A placeholder like 000-00-0000 is FP everywhere
--file <id> every match in a file The whole file is noise (a fixture, a vendor manifest)

For an agent that decides one finding at a time, --match is the right selector. The text and file scopes are useful when a human (or a smarter agent) can make a generalisation across many matches in one call.

findings mark --from-json — bulk write

This is the path agents should default to. One file, one transaction:

piicrawler findings mark --scan 42 --from-json verdicts.json
[
  { "match_id": 101, "verdict": "fp" },
  { "match_id": 102, "verdict": "tp" },
  { "text": { "pii_type": "ssn", "term": "000-00-0000" }, "verdict": "fp" },
  { "file_id": 9, "verdict": "tp" }
]

Each entry carries its own verdict and exactly one selector. If any entry is malformed, the CLI exits non-zero and no verdicts are written. That all-or-nothing behaviour means a flaky agent run never leaves the database half-updated.

- reads JSON from stdin, which is what you want when the agent is the upstream process:

my-agent --scan 42 | piicrawler findings mark --scan 42 --from-json -

findings stats — check progress

piicrawler findings stats --scan 42 --json
{
  "scan_id": 42,
  "totals": { "unreviewed": 120, "false_positive": 32, "true_positive": 8 },
  "by_pii_type": [
    { "pii_type": "ssn",   "unreviewed": 50, "false_positive": 10, "true_positive": 5 },
    { "pii_type": "email", "unreviewed": 70, "false_positive": 22, "true_positive": 3 }
  ]
}

A natural agent loop terminates when totals.unreviewed == 0, or when a per-type budget is hit.

A complete worked example

The agent below uses the Anthropic SDK to triage one batch. It is deliberately small so the moving parts are visible — wrap it in a loop for a full scan.

import json, subprocess
from anthropic import Anthropic

SCAN_ID = 42
BATCH = 25
client = Anthropic()

def fetch_unreviewed():
    out = subprocess.check_output([
        "piicrawler", "findings", "list",
        "--scan", str(SCAN_ID),
        "--json", "--limit", str(BATCH),
        "--context", "surrounding",
        # Drop --redact when running against a local or private-cloud model.
        "--redact",
    ])
    return json.loads(out)

def classify(matches):
    prompt = (
        "For each match, decide if it is real PII (tp) or noise (fp). "
        "Noise includes test fixtures, sample data, vendor licences, and "
        "obvious placeholders. Reply with a JSON array of "
        '{"match_id": <id>, "verdict": "fp"|"tp"} entries — nothing else.\n\n'
        + json.dumps(matches, indent=2)
    )
    resp = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        messages=[{"role": "user", "content": prompt}],
    )
    return json.loads(resp.content[0].text)

def apply(verdicts):
    proc = subprocess.run(
        ["piicrawler", "findings", "mark",
         "--scan", str(SCAN_ID), "--from-json", "-"],
        input=json.dumps(verdicts),
        text=True, check=True, capture_output=True,
    )
    print(proc.stdout)

while True:
    batch = fetch_unreviewed()
    if not batch:
        break
    apply(classify(batch))

Run it, then check progress:

piicrawler findings stats --scan 42

Two implementation notes that matter in practice:

  • Pin the JSON shape in the prompt. Models drift if you ask them to "return verdicts." Specifying the exact array shape, and parsing it strictly on return, surfaces drift early instead of letting a malformed batch silently corrupt the run.
  • One transaction per batch. --from-json rolls the whole batch back if any entry is malformed. That is the property you want — a partial write would mean re-pulling a mixed list of "already done by the agent" and "not yet" findings on the next iteration.

How the verdict model behaves

Verdicts written from the CLI are read by every other surface (TUI, web UI, HTML report, dsar). A few rules to keep in mind when an agent is making decisions:

  • Specificity wins on read. A match-scope verdict overrides a text-scope verdict, which overrides a file-scope verdict at the same location. Mass-marking --text ssn 000-00-0000 fp and then later overriding one occurrence with --match 42 tp works the way you expect.
  • unmark takes the same selectors as mark. It also requires --verdict <fp|tp>, because the FP and TP rows are stored separately.
  • --from-json ignores any --verdict flag on the command line. Each entry's own verdict field is the source of truth.
  • Re-running a scan reapplies text- and file-scope verdicts. Match-scope verdicts only attach to the specific match they were written for, so a re-extracted file produces fresh unreviewed matches at match scope. This is intentional: cheap rules generalise, precise rules don't.

Redacting sensitive terms before they leave the machine

The first line of defense is keeping the data on infrastructure you trust. The second line, for cases where you can't avoid an external endpoint, is format-preserving redaction: PII Crawler can rewrite the most sensitive terms in findings list output before they ever leave the binary, so what reaches the model is structurally identical but no longer the original digits.

Pass --redact to opt in:

piicrawler findings list --scan 42 --redact --json --limit 50

What changes in the output:

  • Every ssn term has its digits replaced with hash-derived digits, preserving dashes (123-45-6789 becomes something like 847-23-9182). Length and separator positions are unchanged.
  • Every dob term has its digits replaced the same way, preserving slashes, dashes, or dots (01/15/1985 becomes something like 93/47/2068, January 15, 1985 becomes January XX, XXXX). Date validity is not preserved; the agent doesn't need it.
  • Every occurrence of those plaintext terms inside context.surrounding (and line / paragraph under --context full) is rewritten with the same redacted form, so a snippet like "Employee SSN: 123-45-6789 filed..." becomes "Employee SSN: 847-23-9182 filed...". The agent still reads the surrounding sentence and can decide.
  • Redaction works at two levels for context strings:
    1. The substitution table is seeded with every distinct ssn/dob term recorded across the whole scan, not just the paginated subset. So a SSN that lives in match A's surrounding text but is itself match #500 (outside --limit 50) still gets redacted.
    2. Each context string is additionally swept with the SSN/DOB detection regexes. Anything matching the dashed SSN pattern (XXX-XX-XXXX) or a date shape (US/ISO/month-name) gets redacted on the fly, even if it was never recorded as a match. This catches placeholders the original scanner filtered out (famous test SSNs like 987-65-4321, formats the anchored detector didn't trust) before they can leak.
  • All other PII types (email, phone, credit card, etc.) pass through unchanged. Redaction is intentionally narrow — broaden it only when a real triage workflow requires it.

The mapping is deterministic per database: a 32-byte secret is generated the first time --redact is used, stored in the local kv_store, and reused for every subsequent run. That means:

  • The same 123-45-6789 always redacts to the same value within your database, so the agent still sees "this exact value repeats 400 times across the scan" — the placeholder signal you actually want.
  • Two different databases produce different mappings, so a redacted output leaked from one machine is useless for fingerprinting values in another.
  • The secret never leaves the machine. If the local SQLite file is compromised, the redaction is moot — but in that case the scan itself is the bigger problem.

What redaction does not protect

  • file_path. Paths often contain identifying information (/srv/hr/jane-doe-i9.pdf) and pass through unchanged. If your paths embed PII, sanitize them upstream or strip them from the JSON before sending.
  • Other PII types. Email addresses, phone numbers, names, and credit card numbers are not redacted in v1. If your context contains these and you can't run a local model, post-process the JSON before it leaves the machine.
  • SSNs in non-dashed form. The context-level regex pass covers the dashed XXX-XX-XXXX shape only. Bare 9-digit SSNs (123456789) are redacted when they were themselves recorded as matches in the scan, but a bare 9-digit SSN that only appears inside another match's surrounding text and was never recorded is not caught — the bare 9-digit pattern matches too many non-SSN integer sequences to safely redact unanchored.
  • Inferred values from surrounding sentences. If a context snippet says "Jane Doe, born January 1985," the model still sees the name and the partial date even when the explicit dob term is rewritten. Redaction is structural, not semantic.

Agent constraints when redaction is on

The agent must write back via match-scope selectors only:

[{ "match_id": 42, "verdict": "fp" }]

Text-scope (--text ssn 123-45-6789) round-trips the term through the database, and the database only knows the original. A redacted term won't match anything. File-scope (--file 7) still works because it references a numeric file ID. In practice, an LLM agent inspecting one finding at a time should be using match-scope anyway.

findings stats and findings unmark --match work normally with redacted runs — the verdict and stats tables operate on IDs, not terms.

Tuning the agent

A few knobs that change cost and quality:

  • Pick a context mode that fits the budget. --context surrounding (the default) is the right balance for most agents. --context full adds the line and paragraph the match was extracted from, useful for narrative documents where one sentence isn't enough. --context none is useful for cheap counts and second-pass classifiers that only need IDs.
  • Filter by PII type. A prompt that triages only ssn can be sharper than a prompt that handles every type. Run multiple passes with different prompts: --pii-type ssn, then --pii-type email, then everything else.
  • Page in batches. Don't pull 10,000 matches in one findings list. A batch of 25 to 100 fits comfortably in a single LLM call and keeps the cost of a bad classification small.
  • Spot-check the agent's verdicts. Open the TUI's findings view, press h to show false positives, and skim the FP rows the agent produced. unmark lets you correct anything that looks wrong without rerunning the whole pass.

See also

Was this page helpful?