CLI Reference
Overview
piicrawler is a single binary that ships in three modes:
- An interactive terminal UI (no arguments)
- A web UI server (
piicrawler serve) - A set of command line subcommands for one-off scans, real-time monitoring, DSAR lookups, and HTML report generation
Run piicrawler help (or -h / --help) to print the same reference at any time.
Synopsis
piicrawler Launch interactive TUI
piicrawler <path> [--workers <n>] [--quiet] [--no-ocr] Scan a file or directory
piicrawler serve [port] Start web UI (default port 3001)
piicrawler watch <path>... [options] Monitor directories for PII in real-time
piicrawler dsar <name> [options] Search for a person's PII across all scans
piicrawler report <scan_id> Generate an HTML report for a scan
piicrawler help Show the built-in help message
Commands
(no arguments) — Interactive TUI
piicrawler
Launches the interactive terminal UI for browsing scans, viewing findings, and managing the local database. This is the default mode when you run the binary with no arguments.
<path> — One-shot scan
piicrawler <path> [--workers <n>] [-j <n>] [--quiet] [--no-ocr]
Scans a single file, an archive container (e.g. .zip, .tar.gz), or a directory tree, and writes results to stdout as pretty-printed JSON. Progress messages are written to stderr so you can pipe stdout safely:
piicrawler ~/Documents > findings.json
Requires a registered installation. If no license is present, the command exits with an error directing you to register via piicrawler serve.
Options:
--workers <n>,-j <n>— Number of worker threads to use when scanning a directory. Defaults to4. Capped at the number of files found.--quiet— Suppress the per-file[start ...]/[done ...]progress lines on stderr.--no-ocr— Skip OCR on images and scanned PDFs. Speeds up scans of mostly-text trees.
Behaviour:
- File: extracts text, runs PII detection, prints a single
ScanResultJSON object. - Container: extracts each entry and scans it, prints an array of
ScanResultobjects. - Directory: recursively walks the tree (skipping symlinks) and scans every supported file type or container in parallel, prints an array of
ScanResultobjects.
serve — Web UI
piicrawler serve [port]
Starts the web UI on the given port (default 3001) and opens an HTTP server you can reach at http://localhost:<port>. The web UI is where you create and manage scans, register your license, and review findings in a browser.
watch — Real-time monitoring
piicrawler watch <path>... [--webhook <url>] [--policy <file>] [--no-json] [--no-ocr] [--debounce <ms>]
Watches one or more directories for file system changes and scans newly created or modified files for PII as they appear. Results are streamed as JSON to stdout by default and recorded in the local database. Press Ctrl+C to stop the daemon.
Options:
--webhook <url>— POST findings to the given webhook URL as they are produced.--policy <file>— Load alert policies from a YAML file. Each policy can match onpii_type,path_pattern, andmax_risk, and triggers anactionandseverity. Loaded policies are written to the database for the daemon to consume.--no-json— Disable the JSON stdout stream (use this when you only want webhook delivery or database persistence).--no-ocr— Skip OCR on images and scanned PDFs.--debounce <ms>— Debounce window for file events in milliseconds. Defaults to500. Useful when editors save in bursts.
dsar — Data Subject Access Request
piicrawler dsar "Person Name" [--assert-clean] [--report <file>] [--json]
Searches every recorded scan in the local database for PII associated with the given person and prints a summary to stderr. Use this to fulfil GDPR/CCPA right-to-know requests or to check whether a specific person's data has leaked into a watched location.
Options:
--assert-clean— Exit with status1if any findings are returned (and0with aCLEAN:line if not). Designed for use in CI pipelines.--report <file>— Write a self-contained HTML report to<file>.--json— Print structured findings as JSON to stdout in addition to the stderr summary.
report — HTML risk report
piicrawler report <scan_id>
Generates a standalone HTML risk report for the scan with the given numeric ID and writes it to piicrawler-report-<scan_id>.html in the current working directory. The scan ID can be found in the TUI or web UI.
help
piicrawler help
piicrawler -h
piicrawler --help
Prints the built-in usage summary to stderr.
Output
piicrawler <path> prints a JSON document with one entry per scanned file. Each entry has the shape:
{
"file_path": "/absolute/path/to/file.pdf",
"findings": [ ... ],
"full_names": [ ... ],
"char_count": 12345,
"error": null
}
If extraction fails for a file, error is set to a short message and findings is empty. Container scans return the same shape, one entry per archive member.
See PII Data Types for the structure of individual findings and Results Storage for the database schema used by serve, watch, and the TUI.
Environment variables
PIICRAWLER_LOG_FILE— If set to a non-empty path, structured logs are appended to the given file in addition to being shown in the TUI Logs view. Useful when runningwatchorserveas a long-lived daemon.
Examples
Scan a directory and save the findings to a file:
piicrawler ~/Downloads --workers 8 > findings.json
Scan a single archive without OCR and pipe to jq:
piicrawler backups/2026-04.zip --no-ocr --quiet | jq '.[] | select(.findings | length > 0)'
Watch two directories with a webhook and a policy file:
piicrawler watch /srv/uploads /srv/exports \
--webhook https://alerts.example.com/piicrawler \
--policy ./policies.yml \
--debounce 1000
Fail a CI job if any PII is found for a given person:
piicrawler dsar "Jane Doe" --assert-clean
Generate an HTML report for scan ID 42:
piicrawler report 42