Reference

CLI Reference

Last updated April 2026

Overview

piicrawler is a single binary that ships in three modes:

An interactive terminal UI (no arguments)
A web UI server (piicrawler serve)
A set of command line subcommands for one-off scans, real-time monitoring, DSAR lookups, and HTML report generation

Run piicrawler help (or -h / --help) to print the same reference at any time.

Synopsis

piicrawler                                              Launch interactive TUI
piicrawler <path> [--workers <n>] [--quiet] [--no-ocr]  Scan a file or directory
piicrawler serve [port]                                 Start web UI (default port 3001)
piicrawler watch <path>... [options]                    Monitor directories for PII in real-time
piicrawler dsar <name> [options]                        Search for a person's PII across all scans
piicrawler report <scan_id>                             Generate an HTML report for a scan
piicrawler help                                         Show the built-in help message

Commands

(no arguments) — Interactive TUI

piicrawler

Launches the interactive terminal UI for browsing scans, viewing findings, and managing the local database. This is the default mode when you run the binary with no arguments.

`<path>` — One-shot scan

piicrawler <path> [--workers <n>] [-j <n>] [--quiet] [--no-ocr]

Scans a single file, an archive container (e.g. .zip, .tar.gz), or a directory tree, and writes results to stdout as pretty-printed JSON. Progress messages are written to stderr so you can pipe stdout safely:

piicrawler ~/Documents > findings.json

Requires a registered installation. If no license is present, the command exits with an error directing you to register via piicrawler serve.

Options:

--workers <n>, -j <n> — Number of worker threads to use when scanning a directory. Defaults to 4. Capped at the number of files found.
--quiet — Suppress the per-file [start ...] / [done ...] progress lines on stderr.
--no-ocr — Skip OCR on images and scanned PDFs. Speeds up scans of mostly-text trees.

Behaviour:

File: extracts text, runs PII detection, prints a single ScanResult JSON object.
Container: extracts each entry and scans it, prints an array of ScanResult objects.
Directory: recursively walks the tree (skipping symlinks) and scans every supported file type or container in parallel, prints an array of ScanResult objects.

`serve` — Web UI

piicrawler serve [port]

Starts the web UI on the given port (default 3001) and opens an HTTP server you can reach at http://localhost:<port>. The web UI is where you create and manage scans, register your license, and review findings in a browser.

`watch` — Real-time monitoring

piicrawler watch <path>... [--webhook <url>] [--policy <file>] [--no-json] [--no-ocr] [--debounce <ms>]

Watches one or more directories for file system changes and scans newly created or modified files for PII as they appear. Results are streamed as JSON to stdout by default and recorded in the local database. Press Ctrl+C to stop the daemon.

Options:

--webhook <url> — POST findings to the given webhook URL as they are produced.
--policy <file> — Load alert policies from a YAML file. Each policy can match on pii_type, path_pattern, and max_risk, and triggers an action and severity. Loaded policies are written to the database for the daemon to consume.
--no-json — Disable the JSON stdout stream (use this when you only want webhook delivery or database persistence).
--no-ocr — Skip OCR on images and scanned PDFs.
--debounce <ms> — Debounce window for file events in milliseconds. Defaults to 500. Useful when editors save in bursts.

`dsar` — Data Subject Access Request

piicrawler dsar "Person Name" [--assert-clean] [--report <file>] [--json]

Searches every recorded scan in the local database for PII associated with the given person and prints a summary to stderr. Use this to fulfil GDPR/CCPA right-to-know requests or to check whether a specific person's data has leaked into a watched location.

Options:

--assert-clean — Exit with status 1 if any findings are returned (and 0 with a CLEAN: line if not). Designed for use in CI pipelines.
--report <file> — Write a self-contained HTML report to <file>.
--json — Print structured findings as JSON to stdout in addition to the stderr summary.

`report` — HTML risk report

piicrawler report <scan_id>

Generates a standalone HTML risk report for the scan with the given numeric ID and writes it to piicrawler-report-<scan_id>.html in the current working directory. The scan ID can be found in the TUI or web UI.

`help`

piicrawler help
piicrawler -h
piicrawler --help

Prints the built-in usage summary to stderr.

Output

piicrawler <path> prints a JSON document with one entry per scanned file. Each entry has the shape:

{
  "file_path": "/absolute/path/to/file.pdf",
  "findings": [ ... ],
  "full_names": [ ... ],
  "char_count": 12345,
  "error": null
}

If extraction fails for a file, error is set to a short message and findings is empty. Container scans return the same shape, one entry per archive member.

See PII Data Types for the structure of individual findings and Results Storage for the database schema used by serve, watch, and the TUI.

Environment variables

PIICRAWLER_LOG_FILE — If set to a non-empty path, structured logs are appended to the given file in addition to being shown in the TUI Logs view. Useful when running watch or serve as a long-lived daemon.

Examples

Scan a directory and save the findings to a file:

piicrawler ~/Downloads --workers 8 > findings.json

Scan a single archive without OCR and pipe to jq:

piicrawler backups/2026-04.zip --no-ocr --quiet | jq '.[] | select(.findings | length > 0)'

Watch two directories with a webhook and a policy file:

piicrawler watch /srv/uploads /srv/exports \
  --webhook https://alerts.example.com/piicrawler \
  --policy ./policies.yml \
  --debounce 1000

Fail a CI job if any PII is found for a given person:

piicrawler dsar "Jane Doe" --assert-clean

Generate an HTML report for scan ID 42:

piicrawler report 42

← Previous

Results Storage

PII Identity Scan

Was this page helpful?

CLI Reference

Overview

Synopsis

Commands

(no arguments) — Interactive TUI

<path> — One-shot scan

serve — Web UI

watch — Real-time monitoring

dsar — Data Subject Access Request

report — HTML risk report

help

Output

Environment variables

Examples

`<path>` — One-shot scan

`serve` — Web UI

`watch` — Real-time monitoring

`dsar` — Data Subject Access Request

`report` — HTML risk report

`help`