Reference

Results Storage

Last updated April 2026

PII Crawler scan results are stored in an SQLite Database. SQLite is a single-file database. Most programing languages have built-in support for connecting to it. By default results will be stored in the user home directory under .piicrawler/scans/<scan-id>.sqlite.

PII Crawler intentionally exposes the application's database to the user so the user can manually or programmatically:

  • View the data that is being collected
  • Run custom queries on the data
  • Create alerts from findings
  • Customize which files are scanned
  • Customize how files are scanned
  • Customize the behavior of PII Crawler

You can think of this database as an API to PII Crawler.

Schema

Files Table

CREATE TABLE IF NOT EXISTS files (
    path TEXT PRIMARY KEY,
    sha256 TEXT,
    size INTEGER NOT NULL,
    extension TEXT NOT NULL,
    mime_type TEXT NOT NULL DEFAULT '',
    skip BOOLEAN DEFAULT 0 NOT NULL,
    last_modified INTEGER DEFAULT 0 NOT NULL,
    parent_path TEXT DEFAULT '' NOT NULL,
    last_scanned_at INTEGER DEFAULT 0 NOT NULL,
    scan_attempts INTEGER DEFAULT 0 NOT NULL,
    last_error TEXT,
    scan_started_at INTEGER,
    scan_finished_at INTEGER
);
Column Description
path Absolute path to file (primary key)
sha256 SHA256 hash of file content (used for deduplication)
size Size of file in bytes
extension File extension (ex: .pdf, .csv)
mime_type Detected file MIME type (ex: application/json, image/jpeg)
skip Boolean flag - if true, file will not be scanned
last_modified Unix timestamp of when the file was last modified
parent_path Path to the parent directory
last_scanned_at Unix timestamp of the last scan attempt
scan_attempts Number of times scanning has been attempted for this file
last_error Error message from the last failed scan attempt (if any)
scan_started_at Unix timestamp when the current/last scan started
scan_finished_at Unix timestamp when the current/last scan finished

Matches Table

CREATE TABLE IF NOT EXISTS matches (
    id INTEGER PRIMARY KEY,
    text TEXT NOT NULL,
    kind TEXT NOT NULL,
    ignored INTEGER DEFAULT 0 NOT NULL,
    UNIQUE(text, kind)
);
Column Description
id Auto-incrementing primary key
text The matched PII text (e.g., actual email address, name, etc.)
kind Type of PII match (e.g., "emails", "names", "ssns", "addresses")
ignored Boolean flag (0 or 1) - if 1, this match will be excluded from results

File Matches Table

CREATE TABLE IF NOT EXISTS file_matches (
    sha256 TEXT,
    match_id INTEGER,
    PRIMARY KEY (sha256, match_id)
);
Column Description
sha256 SHA256 hash of file content (references files.sha256)
match_id ID of the match (references matches.id)

This junction table links files to their PII matches using content hashing for deduplication. Multiple files with identical content share the same matches.

Was this page helpful?