Quasi-Identifier
Data points that appear anonymous alone but can identify individuals when combined
A quasi-identifier (QI) is an attribute or combination of attributes that, while not uniquely identifying on their own, can be combined with external information to re-identify individuals. Unlike direct identifiers such as Social Security numbers or email addresses, quasi-identifiers are often demographic or contextual data points—ZIP code, birth date, gender, job title—that appear innocuous in isolation but become powerful identification tools when linked together.
The risk of a quasi-identifier is determined by its cardinality: the number of unique values it can take. A binary attribute like gender (2 values) has low cardinality and limited identification power. But a precise birth date (over 36,500 possible values in a century) or a specific job title can dramatically narrow down a population. When multiple quasi-identifiers are combined, the number of possible unique combinations multiplies—this combinatorial explosion means that even a few moderate-cardinality attributes can create uniqueness in most populations.
Research demonstrates this effect starkly: 87% of the U.S. population can be uniquely identified using only three quasi-identifiers—5-digit ZIP code, gender, and full date of birth. In modern high-dimensional datasets with hundreds of attributes, almost any subset of 5-10 attributes can function as a unique fingerprint.
Regulators treat quasi-identifiers with significant scrutiny. HIPAA's Safe Harbor method specifically requires removal or generalization of 18 identifier types, including geographic subdivisions smaller than a state and date elements (except year) related to individuals. GDPR takes a broader approach, defining personal data as information relating to an "identifiable" person—including those identifiable "indirectly" via combinations of factors. The GDPR standard asks whether identification is possible using "means reasonably likely to be used."
The danger of quasi-identifiers lies in their ubiquity and their linkability to external datasets. Voter registration records, social media profiles, and commercial databases all contain overlapping quasi-identifiers that can serve as join keys. A dataset that appears anonymous internally may become fully identified when linked to publicly available auxiliary information. Liability Quant's methodology specifically quantifies quasi-identifier density and cardinality to assess this linkage potential across known external datasets.