PII Crawler scan results are stored in an SQLite Database. SQLite is a single-file database. Most programing languages have built-in support for connecting to it. By default results will be stored in the user home directory under .piicrawler/scans/<scan-id>.sqlite
.
PII Crawler intentionally exposes the application's database to the user so the user can manually or programmatically:
You can think of this database as an API to PII Crawler.
CREATE TABLE IF NOT EXISTS files (
path TEXT PRIMARY KEY,
sha256 TEXT,
size INTEGER NOT NULL,
extension TEXT NOT NULL,
mime_type TEXT NOT NULL DEFAULT '',
skip BOOLEAN DEFAULT 0 NOT NULL,
last_modified INTEGER DEFAULT 0 NOT NULL,
parent_path TEXT DEFAULT '' NOT NULL,
last_scanned_at INTEGER DEFAULT 0 NOT NULL,
scan_attempts INTEGER DEFAULT 0 NOT NULL,
last_error TEXT,
scan_started_at INTEGER,
scan_finished_at INTEGER
);
Column | Description |
---|---|
path | Absolute path to file (primary key) |
sha256 | SHA256 hash of file content (used for deduplication) |
size | Size of file in bytes |
extension | File extension (ex: .pdf, .csv) |
mime_type | Detected file MIME type (ex: application/json, image/jpeg) |
skip | Boolean flag - if true, file will not be scanned |
last_modified | Unix timestamp of when the file was last modified |
parent_path | Path to the parent directory |
last_scanned_at | Unix timestamp of the last scan attempt |
scan_attempts | Number of times scanning has been attempted for this file |
last_error | Error message from the last failed scan attempt (if any) |
scan_started_at | Unix timestamp when the current/last scan started |
scan_finished_at | Unix timestamp when the current/last scan finished |
CREATE TABLE IF NOT EXISTS matches (
id INTEGER PRIMARY KEY,
text TEXT NOT NULL,
kind TEXT NOT NULL,
ignored INTEGER DEFAULT 0 NOT NULL,
UNIQUE(text, kind)
);
Column | Description |
---|---|
id | Auto-incrementing primary key |
text | The matched PII text (e.g., actual email address, name, etc.) |
kind | Type of PII match (e.g., "emails", "names", "ssns", "addresses") |
ignored | Boolean flag (0 or 1) - if 1, this match will be excluded from results |
CREATE TABLE IF NOT EXISTS file_matches (
sha256 TEXT,
match_id INTEGER,
PRIMARY KEY (sha256, match_id)
);
Column | Description |
---|---|
sha256 | SHA256 hash of file content (references files.sha256) |
match_id | ID of the match (references matches.id) |
This junction table links files to their PII matches using content hashing for deduplication. Multiple files with identical content share the same matches.