Usage
This page is the interface-agnostic reference for creating a scan and reviewing its results. The same scan options are available regardless of how you launch PII Crawler — they're three views of the same engine and the same on-disk database.
If you haven't installed and registered yet, start with the Quickstart.
Pick an interface
| Interface | Launch with | Best for |
|---|---|---|
| TUI | piicrawler (no args) |
Day-to-day interactive use over SSH or in a local terminal |
| Web | piicrawler serve |
Sharing a scan with a teammate on the same network, browser-based triage |
| CLI | piicrawler scan <path> |
Automation, CI, scripting, piping output into another tool |
The TUI and the web UI write to the same local database, so a scan started in one shows up in the other.
Create a scan
TUI
From the scan list, press n to open the new-scan form.
Web UI
Run piicrawler serve, open http://localhost:3001, and click Create New Scan.
Command line
piicrawler scan ~/Documents --workers 8 --out findings.jsonl
The scan keyword is optional — piicrawler ~/Documents works the same. See the CLI Reference for every flag.
Scan options
Every interface exposes the same options. The only required field is the path to scan; everything below is optional.
- Path — file, directory, or archive (
.zip,.tar.gz, etc.) to scan. Directory scans walk the tree in parallel. - Workers — number of files scanned concurrently. Defaults to
4; raise it on machines with more cores or fast storage. - OCR — extract text from images and scanned PDFs. On by default; turn it off for a faster scan over mostly-text trees (
--no-ocron the CLI). - File type filter — restrict the scan to specific file types (e.g. only PDFs and emails) when you don't want to scan everything.
- PII types — restrict detection to specific PII data types (e.g. only SSNs and credit cards).
- Terms list — a list of exact-match strings to flag in addition to the built-in detectors. Useful for individual identifiers (a specific employee ID, a customer's name) during a DSAR or incident investigation.
- Custom regex — your own regex patterns for data shapes the built-in detectors don't cover.
- Proximity regex groups — a set of regex patterns that must all match within a configurable distance, for data that is only meaningful when found together (e.g. a name near a date of birth).
- Exclusion patterns — paths to skip (build artifacts, vendored dependencies, virtual environments).
Review the findings
Findings stream in as files complete — you don't need to wait for the scan to finish.
- TUI — press Enter on a scan to drill into its findings, f to open the Findings view, r to enter Review mode for fast keyboard triage.
- Web UI — click View Files on a scan to filter, search, sort, and export to CSV. Click the Duration column header to sort by per-file scan time so the slowest files surface at the top — useful when one file is holding up the queue. Click again to reverse the order, a third time to clear it.
- CLI — JSON streams to stdout (or to
--out) as findings are produced. Use--format csvfor a flat report.
Browse findings as a directory tree
The Web UI's Tree button (next to View Files on the scan page) opens a directory-style browser that aggregates per-folder PII counts. Each row shows the folder's descendant file count, its non-false-positive match count, and a per-PII-type breakdown that sums everything underneath. This is the fastest way to answer "which part of the share has the bulk of the SSNs?" without paging through individual files.
Click a folder to drill in, click the path segments at the top to jump back to any ancestor, or click a folder's match-count number to open the file list pre-filtered to that path. Files appear as leaves and link straight through to the file detail page. Archive contents (.zip entries) appear as a folder named after the archive so you can navigate into them the same way.
False-positive matches at any scope (match, text, or file) are excluded from the rolled-up counts, so the tree mirrors what the rest of the Web UI shows after you've triaged.
Once you have findings, the next step is separating real positives from noise. See Triaging Findings for the verdict model and the keyboard-driven review workflow.