Reference

Supported File Types

Last updated April 2026

📄 Documents

Common Name MIME Type
PDF application/pdf
Word (DOC) application/msword
Word (DOCX) application/vnd.openxmlformats-officedocument.wordprocessingml.document
Excel (XLS) application/vnd.ms-excel
Excel (XLSX) application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
PowerPoint (PPT) application/vnd.ms-powerpoint
PowerPoint (PPTX) application/vnd.openxmlformats-officedocument.presentationml.presentation
Rich Text Format (RTF) application/rtf
Apple Keynote application/x-iwork-keynote-sffkey
Apple Pages application/x-iwork-pages-sffpages
Apple Numbers application/x-iwork-numbers-sffnumbers
OpenDocument Text (ODT) application/vnd.oasis.opendocument.text
OpenDocument Spreadsheet (ODS) application/vnd.oasis.opendocument.spreadsheet
OpenDocument Presentation (ODP) application/vnd.oasis.opendocument.presentation
Hancom Hangul Word Processor application/x-hwp
EPUB application/epub+zip
HTML text/html
Plain text text/plain
Markdown text/x-markdown
CSV text/csv

🗃️ Archives / Packages

Common Name MIME Type
ZIP application/zip
GZIP application/x-gzip
BZIP2 application/x-bzip2
7z application/x-7z-compressed
RAR application/x-rar-compressed
TAR application/x-tar
ISO image application/x-iso9660-image
JAR application/java-archive
Apple DMG application/x-apple-diskimage
CPIO application/x-cpio

🖼️ Images (with OCR text extraction)

Common Name MIME Type
JPEG image/jpeg
PNG image/png
GIF image/gif
TIFF image/tiff
BMP image/bmp
BMP (Windows) image/x-ms-bmp
ICO image/x-icon
Photoshop (PSD) image/vnd.adobe.photoshop
GIMP (XCF) image/x-xcf
AutoCAD (DWG) image/vnd.dwg

👨‍💻 Code / Source Files

Common Name MIME Type
Java source text/x-java-source
Java class application/java-vm
C source text/x-c
C++ source text/x-c++src
Python source text/x-python
JavaScript application/javascript
Shell script text/x-shellscript
PHP application/x-php
Object code application/x-object

Including any plain-text file types


✉️ Email / Messaging

Common Name Extension(s) MIME Type
Email (EML) .eml message/rfc822
Outlook Message (MSG) .msg application/vnd.ms-outlook
Gmail / Thunderbird mailbox .mbox application/mbox
Outlook data file .pst, .ost application/vnd.ms-outlook-pst

.mbox archives are treated as containers: each message inside becomes its own scan unit, just like an entry inside a .zip. Findings stream to disk per message, and the file_path in JSONL output carries the message ordinal (and Message-ID: when present) — for example mail.mbox::message-000042::<[email protected]>. Headers (From, To, Cc, Bcc, Subject, Reply-To), bodies, and decoded attachments are all scanned. The streaming reader handles multi-gigabyte mboxes (Gmail Takeout exports) without loading the whole file into memory. See Scan Gmail for PII for a walkthrough.


🗄️ Database flat-files

Common Name Extension(s)
SQLite .sqlite, .sqlite3, .db, .db3
Microsoft Access .mdb, .accdb

Database files are opened read-only, every user table is dumped to text, and the resulting text is run through PII detection.

Was this page helpful?