Introduction
PII Crawler is a local-first scanner that looks through your files, mailboxes, database exports, and network shares for Personal Identifiable Information (PII) and other sensitive data. Use it to:
- Secure, encrypt, or redact data before it leaks
- Investigate an incident and understand exposure
- Build a Data Leak Prevention (DLP) workflow
- Meet data compliance obligations (GDPR, CCPA, and similar)
PII Crawler is influenced by the Unix Philosophy and tries to do one thing well: find sensitive data with a low false-positive rate. It is not a complete DLP tool — it doesn't send alerts or push notifications on its own — but it composes well with one. Scans run entirely on your machine; results never leave it. See Security for the full network-traffic disclosure.
How you run it
PII Crawler ships as a single cross-platform binary for macOS, Windows, and Linux. The same binary gives you three interfaces:
- Interactive TUI — run
piicrawlerwith no arguments for a keyboard-driven terminal UI - Web UI — run
piicrawler serveand openhttp://localhost:3001 - Command line —
piicrawler scan,watch,dsar, andreportfor one-shot or scripted use
Install in under five minutes →
Supported PII Data Types
PII Crawler ships with high-fidelity detectors for:
- U.S. Social Security Number (SSN)
- First and last names
- Street address and City / State / Zip clusters
- Email address
- Date of birth
- U.S. passport number
- Credit card number (Luhn-validated)
- Driver's license number
- AWS credentials
- User-defined exact-match terms and custom regex patterns
See the full list with detection notes in PII Data Types, and add your own rules with custom regex or proximity groups.
Supported Data Sources
- Files and directories — over 50 file types including PDF, Office (Word/Excel/PowerPoint), Apple iWork, OpenDocument, EPUB, HTML, plain text, and source code. See Supported File Types.
- Images — text inside JPEG/PNG/TIFF/BMP and scanned PDFs is extracted with built-in OCR.
- Archives — ZIP, tar, gzip, bzip2, 7z, RAR, ISO, JAR, DMG, and CPIO are scanned recursively.
- Email exports — Gmail Takeout
.mbox, Outlook.pst, individual.emland.msgfiles. See Scan Gmail for PII. - Database files — SQLite and Microsoft Access (
.mdb/.accdb). - Network shares — Windows / Samba SMB shares scanned directly over the network with no mounting required. See Scan an SMB Network Share.
- Live monitoring —
piicrawler watchpolls a directory tree and dispatches violations to stdout, a webhook, or the local database. See Watch Mode & Policies.
For remote sources still on the way (OneDrive, SharePoint, S3, hosted SQL databases, etc.) see the road map.