Getting started

Introduction

Last updated April 2026

PII Crawler is a local-first scanner that looks through your files, mailboxes, database exports, and network shares for Personal Identifiable Information (PII) and other sensitive data. Use it to:

  • Secure, encrypt, or redact data before it leaks
  • Investigate an incident and understand exposure
  • Build a Data Leak Prevention (DLP) workflow
  • Meet data compliance obligations (GDPR, CCPA, and similar)

PII Crawler is influenced by the Unix Philosophy and tries to do one thing well: find sensitive data with a low false-positive rate. It is not a complete DLP tool — it doesn't send alerts or push notifications on its own — but it composes well with one. Scans run entirely on your machine; results never leave it. See Security for the full network-traffic disclosure.

How you run it

PII Crawler ships as a single cross-platform binary for macOS, Windows, and Linux. The same binary gives you three interfaces:

  • Interactive TUI — run piicrawler with no arguments for a keyboard-driven terminal UI
  • Web UI — run piicrawler serve and open http://localhost:3001
  • Command linepiicrawler scan, watch, dsar, and report for one-shot or scripted use

Install in under five minutes →

Supported PII Data Types

PII Crawler ships with high-fidelity detectors for:

  • U.S. Social Security Number (SSN)
  • First and last names
  • Street address and City / State / Zip clusters
  • Email address
  • Date of birth
  • U.S. passport number
  • Credit card number (Luhn-validated)
  • Driver's license number
  • AWS credentials
  • User-defined exact-match terms and custom regex patterns

See the full list with detection notes in PII Data Types, and add your own rules with custom regex or proximity groups.

Supported Data Sources

  • Files and directories — over 50 file types including PDF, Office (Word/Excel/PowerPoint), Apple iWork, OpenDocument, EPUB, HTML, plain text, and source code. See Supported File Types.
  • Images — text inside JPEG/PNG/TIFF/BMP and scanned PDFs is extracted with built-in OCR.
  • Archives — ZIP, tar, gzip, bzip2, 7z, RAR, ISO, JAR, DMG, and CPIO are scanned recursively.
  • Email exports — Gmail Takeout .mbox, Outlook .pst, individual .eml and .msg files. See Scan Gmail for PII.
  • Database files — SQLite and Microsoft Access (.mdb / .accdb).
  • Network shares — Windows / Samba SMB shares scanned directly over the network with no mounting required. See Scan an SMB Network Share.
  • Live monitoringpiicrawler watch polls a directory tree and dispatches violations to stdout, a webhook, or the local database. See Watch Mode & Policies.

For remote sources still on the way (OneDrive, SharePoint, S3, hosted SQL databases, etc.) see the road map.

Was this page helpful?