Before you can start a scan you need to register your copy of PII Crawler or start a demo/trial.
./piicrawler register
./piicrawler register
Please enter a valid email:
me@email.com
Successfully registered product. license.lic file downloaded to this directory
Keep this file in the same directory as piicrawler to avoid having to re-register in the future.
Scan files:
./piicrawler scan <root-path>
Examples:
./piicrawler scan ~
./piicrawler scan /mnt/shared-drive
piicrawler.exe scan C:\Users\
⏯ You can pause/resume scanning at any time. PII Crawler keeps track of what it has already scanned. To stop just do Ctrl + C
.
Scan a single file:
./piicrawler scanfile /home/user/dotfiles/emacs/.emacs.d/elpa/archives/melpa/archive-contents
The output of scanfile
is JSON. It is meant to be both human readable as well as passed to other applications and systems. The output structure is defined in Results Storage
{
"path": "/home/user/dotfiles/emacs/.emacs.d/elpa/archives/melpa/archive-contents",
"size": 2165865,
"mime_type": "text/plain; charset=utf-8",
"unique_common_first_names": 102,
"unique_common_last_names": 337,
"unique_common_email_domain_suffixes": 1,
"unique_emails": 2072,
"unique_addresses": 0,
"matches": {
"address": null,
"csz": null,
"email": [
"*****************",
"*****************",
(2072 email addresses)
...
To view the results after the scan has finished run:
./piicrawler serve
This will start a web-based UI on a local HTTP server at http://localhost:8080
. This will allow you to view all the PII found and sort by size, type, etc.
If you have an incident where you know what you are searching for you can provide a group of terms to match against. PII Crawler will do a supplemental exact match search when the exact-match.json
file is provided in the same directory. Provide your terms in a map of id/name of the group of terms and a list of strings for terms. If all terms are found within a distance (usually same file) an exact_match
will be triggered.
Example:
{
"Mrs. Hilda Schrader Whitcher": ["whitcher", "078-05-1120"],
"Hilda Schrader Whitcher no dash": ["whitcher", "078051120"]
}
Note: This is no longer a real SSN but there is an interesting story behind it.
💌 Get notified on new features and updates