Results Storage

PII Crawler scan results are stored in an SQLite Database. SQLite is a single-file database. Most programing languages have built-in support for connecting to it. By default this file (piicrawler.db) will be created in the same directory that you run PII Crawler from.

PII Crawler intentionally exposes the application’s database to the user so the user can manually or programmatically:

  • View the data that is being collected
  • Run custom queries on the data
  • Create alerts from findings
  • Customize which files are scanned
  • Customize how files are scanned
  • Customize the behavior of PII Crawler

You can think of this database as an API to PII Crawler.


		path TEXT primary key,
		scan_started_at INTEGER,
		scan_finished_at INTEGER,
		size INTEGER,
		extension TEXT,
		mime_type TEXT,
		csz_clusters INTEGER,
		unique_csz_clusters INTEGER,
		unique_common_first_names INTEGER,
		unique_common_last_names INTEGER,
		potential_tax_ids_or_ssns INTEGER,
		text_extracted BOOLEAN default 0 NOT NULL,
		unique_common_email_domain_suffixes INTEGER,
		unique_emails INTEGER,
		unique_addresses INTEGER,
		results TEXT,
		skip BOOLEAN default 0 NOT NULL
Column Description
path absolute path to file
scan_started_at unix timestamp of when the scan started
scan_finished_at unix timestamp of when the scan finished
size size of file in bytes
extension file extension (ex: .pdf, .csv)
mime_type detected file mimetype (ex: application/json, image/jpeg)
csz_clusters city, state, zip combination matches
unique_csz_clusters unique city, state, zip combination matches
unique_common_first_names unique common first names
unique_common_last_names unique common last names
potential_tax_ids_or_ssns SSNs or Tax IDs
text_extracted bool if file parsing, text extraction, or OCR was used
unique_common_email_domain_suffixes count of common email suffixes found (supplemental to unique_emails)
unique_emails unique full email addresses
unique_addresses unique street addresses with match city state zip
results not yet used
skip bool if true file will not be scanned

