Getting started

Usage

Last updated June 2026

This page is the interface-agnostic reference for creating a scan and reviewing its results. The same scan options are available regardless of how you launch PII Crawler — they're three views of the same engine and the same on-disk database.

If you haven't installed and registered yet, start with the Quickstart.

Pick an interface

Interface Launch with Best for
TUI piicrawler (no args) Day-to-day interactive use over SSH or in a local terminal
Web piicrawler serve Sharing a scan with a teammate on the same network, browser-based triage
CLI piicrawler scan <path> Automation, CI, scripting, piping output into another tool

The TUI and the web UI write to the same local database, so a scan started in one shows up in the other.

Create a scan

TUI

From the scan list, press n to open the new-scan form.

Web UI

Run piicrawler serve, open http://localhost:3001, and click Create New Scan.

Command line

piicrawler scan ~/Documents --workers 8 --out findings.jsonl

The scan keyword is optional — piicrawler ~/Documents works the same. See the CLI Reference for every flag.

Scan options

Every interface exposes the same options. The only required field is the path to scan; everything below is optional.

  • Path — file, directory, or archive (.zip, .tar.gz, etc.) to scan. Directory scans walk the tree in parallel.
  • Workers — number of files scanned concurrently. Defaults to 4; raise it on machines with more cores or fast storage.
  • OCR — extract text from images and scanned PDFs. On by default; turn it off for a faster scan over mostly-text trees (--no-ocr on the CLI).
  • File type filter — restrict the scan to specific file types (e.g. only PDFs and emails) when you don't want to scan everything.
  • PII types — restrict detection to specific PII data types (e.g. only SSNs and credit cards).
  • Terms list — a list of exact-match strings to flag in addition to the built-in detectors. Useful for individual identifiers (a specific employee ID, a customer's name) during a DSAR or incident investigation.
  • Custom regex — your own regex patterns for data shapes the built-in detectors don't cover.
  • Proximity regex groups — a set of regex patterns that must all match within a configurable distance, for data that is only meaningful when found together (e.g. a name near a date of birth).
  • Exclusion patterns — paths to skip (build artifacts, vendored dependencies, virtual environments).

Review the findings

Findings stream in as files complete — you don't need to wait for the scan to finish.

  • TUI — press Enter on a scan to drill into its findings, f to open the Findings view, r to enter Review mode for fast keyboard triage.
  • Web UI — click View Files on a scan to filter, search, sort, and export to CSV. Click the Duration column header to sort by per-file scan time so the slowest files surface at the top — useful when one file is holding up the queue. Click again to reverse the order, a third time to clear it.
  • CLI — JSON streams to stdout (or to --out) as findings are produced. Use --format csv for a flat report.

Browse findings as a directory tree

The Web UI's Tree button (next to View Files on the scan page) opens a directory-style browser that aggregates per-folder PII counts. Each row shows the folder's descendant file count, its non-false-positive match count, and a per-PII-type breakdown that sums everything underneath. This is the fastest way to answer "which part of the share has the bulk of the SSNs?" without paging through individual files.

Click a folder to drill in, click the path segments at the top to jump back to any ancestor, or click a folder's match-count number to open the file list pre-filtered to that path. Files appear as leaves and link straight through to the file detail page. Archive contents (.zip entries) appear as a folder named after the archive so you can navigate into them the same way.

False-positive matches at any scope (match, text, or file) are excluded from the rolled-up counts, so the tree mirrors what the rest of the Web UI shows after you've triaged.

Once you have findings, the next step is separating real positives from noise. See Triaging Findings for the verdict model and the keyboard-driven review workflow.

Was this page helpful?