Sensitive PII
Category of personal data whose exposure poses elevated risk of discrimination, identity theft, or physical harm
Sensitive PII (Sensitive Personal Information) is a sub-category of personal data that, if exposed or misused, poses a high risk of material harm to an individual—such as discrimination, identity theft, or physical threat. While standard PII (e.g., name, work email) is protected under general privacy laws, Sensitive PII is subject to stricter legal treatment, often requiring explicit opt-in consent or granting individuals special rights to limit its use.
Different jurisdictions use different terms and compliance models. Under GDPR Article 9, "Special Category Data" is generally prohibited from processing unless an explicit exception applies, such as explicit consent. California's CPRA defines "Sensitive Personal Information" (SPI) and grants a "Right to Limit"—users can restrict data use to necessary business functions. Virginia's VCDPA and Colorado's CPA require affirmative opt-in consent before collecting sensitive data. HIPAA classifies health information as "Protected Health Information" (PHI) with strict authorization requirements.
Modern definitions have expanded beyond static identifiers (SSN, passport numbers) to include dynamic indicators. Precision geolocation—GPS data revealing visits to medical clinics, places of worship, or political gatherings—is treated as inherently sensitive. The FTC has designated location data revealing "sensitive locations" as subject to heightened scrutiny, pursuing enforcement against data brokers like Kochava and X-Mode.
Inferred sensitive data presents a growing challenge. The European Court of Justice ruled in 2024 that data about pharmacy purchases constitutes "health data" because it allows deduction of health status. If a dataset enables an analyst to infer a sensitive attribute with high accuracy, that dataset is legally treated as sensitive even without containing explicit medical records or SSNs. This creates liability for organizations whose ML models derive sensitive characteristics from ostensibly innocuous data.
Proxy variables create additional risk. Machine learning models may use non-sensitive data points that closely correlate with protected characteristics—using zip code and magazine subscriptions as proxies for race, or work hours as proxies for gender. The EU AI Act and recent FTC guidance treat the use of proxy variables for discriminatory outcomes as violations of both privacy and consumer protection laws.
GDPR Article 9 categories include: racial/ethnic origin, political opinions, religious/philosophical beliefs, trade union membership, genetic data, biometrics for identification, health data, and sex life/orientation. CPRA's SPI list adds: SSN, driver's license/passport, account credentials, precise geolocation, and contents of mail/email/text messages.
For liability quantification, sensitive PII acts as a severity multiplier on dataset toxicity. Biometric templates carry the highest multiplier due to immutability and BIPA's private right of action (5,000 per violation). Precision geolocation, inferred health status, and government IDs each escalate risk scores significantly. Technical controls like differential privacy, k-anonymity, and encrypted processing can reduce but not eliminate the elevated liability associated with sensitive data categories.