Data Types

PII Crawler currently supports the following PII data types. We plan to support all PII data types defined by CPPA.

U.S. Social Security Number (SSN)

9 digit numerical usually in the format NNN-NN-NNNN. The prefix used to have meaning but was removed in the randomization process June 25, 2011 where previously unassigned area numbers were introduced for assignment excluding area numbers 000, 666 and 900-999.

  • Valid (this specific number is not): 078-05-1120
  • Not Valid: 666-12-1234

U.S. City, State, Zip Cluster (CSZ)

A cluster is a set of distinct pieces of data that by themselves don’t represent much but when found or linked together can produce something meaningful.

90210 by itself doesn’t mean much but when 90210 is found near the words Beverly Hills we know we have a city and zip code. PII Crawler uses this clustering method to find City, State, and Zip codes.

Street Address

Meaningful street addresses are often found near CSZ clusters.

First Name

PII Crawler uses common name lists and NER techniques to find names

Last Name

PII Crawler uses common name lists and NER techniques to find names

Email Address

PII Crawler uses a FSM to find email addresses.

US Passport

Begins with a letter followed by eight numbers

