Before you can start a scan you need to register your copy of PII Crawler or start a demo/trial.

./piicrawler register

./piicrawler register
Please enter a valid email:
Successfully registered product. license.lic file downloaded to this directory
Keep this file in the same directory as piicrawler to avoid having to re-register in the future.

Scan many files

Scan files:

./piicrawler scan <root-path>


./piicrawler scan ~ or ./piicrawler scan /mnt/shared-drive

Scan a single file

Scan a single file:

./piicrawler scanfile /home/user/dotfiles/emacs/.emacs.d/elpa/archives/melpa/archive-contents

The output of scanfile is JSON. It is meant to be both human readable as well as passed to other applications and systems. The output structure is defined in Results Storage

  "path": "/home/user/dotfiles/emacs/.emacs.d/elpa/archives/melpa/archive-contents",
  "size": 2165865,
  "mime_type": "text/plain; charset=utf-8",
  "unique_common_first_names": 102,
  "unique_common_last_names": 337,
  "unique_common_email_domain_suffixes": 1,
  "unique_emails": 2072,
  "unique_addresses": 0,
  "matches": {
    "address": null,
    "csz": null,
    "email": [
      (2072 email addresses)

Exact Match

If you have an incident where you know what you are searching for you can provide a group of terms to match against. PII Crawler will do a supplemental exact match search when the exact-match.json file is provided in the same directory. Provide your terms in a map of id/name of the group of terms and a list of strings for terms. If all terms are found within a distance (usually same file) an exact_match will be triggered.


    "Mrs. Hilda Schrader Whitcher": ["whitcher", "078-05-1120"],
    "Hilda Schrader Whitcher no dash": ["whitcher", "078051120"]

Note: This is no longer a real SSN but there is an interesting story behind it.

💌 Get notified on new features and updates

Only sent when a new version is released. Nothing else.