LLM Privacy
Specific privacy risks in large language model AI systems
LLM Privacy encompasses the unique privacy challenges posed by large language models and other foundation models. Unlike traditional databases where data is stored discretely and can be deleted record by record, LLMs transform training data into distributed neural network weights, creating novel privacy risks that are difficult to assess, regulate, or remediate.
The fundamental concern is training data memorization. Research has demonstrated that large language models can memorize and reproduce verbatim text from their training data, including personally identifiable information. Studies have extracted credit card numbers, phone numbers, email addresses, physical addresses, and other sensitive information from models through carefully crafted prompts. This memorization risk increases with model size and training data repetition—the same personal information appearing in multiple training documents is more likely to be recoverable.
Several distinct attack vectors exploit LLM privacy vulnerabilities. Training data extraction prompts the model to complete or regenerate memorized sequences, potentially revealing personal information from training documents. Membership inference attacks determine whether specific data was used in training, potentially exposing that individuals' information was collected without consent. Model inversion attacks attempt to reconstruct training data characteristics from model outputs, creating privacy risks even when direct extraction fails. Prompt injection attacks can manipulate models into revealing system prompts, fine-tuning data, or other information the deployer intended to keep confidential.
These challenges are amplified by the unprecedented scale of LLM training data. Foundation models are typically trained on billions of documents scraped from the internet, including personal information published on websites, forums, social media, and data breach dumps. The sheer volume makes it practically impossible to audit training data for PII before training, to respond meaningfully to individual erasure requests, or to determine what personal information a model may have learned. Organizations deploying LLMs may inherit liability for privacy violations committed during training they had no control over.
The difficulty of data deletion creates tension with privacy rights. When a data subject exercises their right to erasure under GDPR, an organization using traditional databases can delete the relevant records. For an LLM, deleting the training data does not remove learned patterns from model weights. True erasure would require either full retraining without the specified data—prohibitively expensive for frontier models—or machine unlearning techniques that remain immature. This creates a fundamental compliance gap: organizations may be legally obligated to fulfill erasure requests they cannot technically satisfy.
Regulators have begun scrutinizing LLM training practices. The Italian data protection authority (Garante) temporarily banned ChatGPT in 2023, citing concerns about lawful basis for processing personal data in training, lack of age verification, and inability to satisfy erasure requests. Other European DPAs launched coordinated investigations. The European Data Protection Board established a ChatGPT task force to develop harmonized approaches. The EDPB's December 2024 Opinion 28/2024 addressed anonymization claims in AI models, stating that whether a model is "anonymous" must be assessed case-by-case considering whether personal data can be extracted or regenerated.
Technical mitigations are emerging but incomplete. Differential privacy applied during training limits how much any individual data point can influence the model, providing formal guarantees but degrading model performance. Output filtering can detect and block memorized sequences before they reach users. Federated learning trains models on distributed data without centralizing it, reducing some collection risks. Fine-tuning guardrails can prevent models from revealing certain categories of information. However, none of these approaches fully address the privacy risks inherent in training on internet-scale data that was never consented for AI use.
For organizations building, fine-tuning, or deploying LLMs, the liability landscape is evolving rapidly. Training data provenance documentation, consent chain verification, and technical memorization testing are becoming baseline due diligence requirements. Model disgorgement—the FTC's remedy requiring deletion of algorithms trained on illegally collected data—represents the regulatory worst-case for AI investments built on questionable data foundations.
Related Regulations
Sources
- Carlini, N., et al. (2021). Extracting Training Data from Large Language Models. USENIX Security Symposium.
- Shokri, R., et al. (2017). Membership Inference Attacks Against Machine Learning Models. IEEE Symposium on Security and Privacy.
- EDPB. (2024). Opinion 28/2024 on AI Models and Personal Data.
- Italian Garante v. OpenAI. (2024). ChatGPT Data Protection Investigation.