Usage

Register

Before you can start a scan you need to register your copy of PII Crawler or start a demo/trial.

./piicrawler register

./piicrawler register
Please enter a valid email:
me@email.com
Successfully registered product. license.lic file downloaded to this directory
Keep this file in the same directory as piicrawler to avoid having to re-register in the future.

Scan many files

Scan files:

./piicrawler scan <root-path>

Examples:

  • ./piicrawler scan ~
  • ./piicrawler scan /mnt/shared-drive
  • piicrawler.exe scan C:\Users\

⏯ You can pause/resume scanning at any time. PII Crawler keeps track of what it has already scanned. To stop just do Ctrl + C.

Scan a single file

Scan a single file:

./piicrawler scanfile /home/user/dotfiles/emacs/.emacs.d/elpa/archives/melpa/archive-contents

The output of scanfile is JSON. It is meant to be both human readable as well as passed to other applications and systems. The output structure is defined in Results Storage

{
  "path": "/home/user/dotfiles/emacs/.emacs.d/elpa/archives/melpa/archive-contents",
  "size": 2165865,
  "mime_type": "text/plain; charset=utf-8",
  "unique_common_first_names": 102,
  "unique_common_last_names": 337,
  "unique_common_email_domain_suffixes": 1,
  "unique_emails": 2072,
  "unique_addresses": 0,
  "matches": {
    "address": null,
    "csz": null,
    "email": [
      "*****************",
      "*****************",
      (2072 email addresses)
      ...

Viewing the results

To view the results after the scan has finished run:

./piicrawler serve

This will start a web-based UI on a local HTTP server at http://localhost:8080. This will allow you to view all the PII found and sort by size, type, etc.

Exact Match

If you have an incident where you know what you are searching for you can provide a group of terms to match against. PII Crawler will do a supplemental exact match search when the exact-match.json file is provided in the same directory. Provide your terms in a map of id/name of the group of terms and a list of strings for terms. If all terms are found within a distance (usually same file) an exact_match will be triggered.

Example:

{
    "Mrs. Hilda Schrader Whitcher": ["whitcher", "078-05-1120"],
    "Hilda Schrader Whitcher no dash": ["whitcher", "078051120"]
}

Note: This is no longer a real SSN but there is an interesting story behind it.

💌 Get notified on new features and updates

Only sent when a new version is released. Nothing else.