Module 5 · Data — The Fuel of AI
Privacy & Governance: Consent, DPDP & GDPR
65 min
Learning objectives
- Identify personally identifiable information (PII) and explain why it carries special obligations
- Distinguish anonymization from pseudonymization and understand their limits
- Summarize the core principles shared by the DPDP Act and GDPR and what they require of practitioners
Data about people is different
Much of the data that fuels AI describes real people — customers, patients, employees. That data is governed by law and by basic ethics. A practitioner cannot treat personal data as just another input; mishandling it risks harm to individuals and serious legal and reputational consequences for the organization.
PII (Personally Identifiable Information) — Any data that can identify a specific person, on its own or in combination with other data.
PII is broader than people expect. A name or national ID number identifies someone directly. But seemingly harmless fields can identify someone in combination — postal code plus date of birth plus gender is often enough to single out one individual. IP addresses, device IDs, and location traces can also count as personal data.
Reducing identifiability: anonymization vs. pseudonymization
| Technique | What it does | Reversible? |
|---|---|---|
| Pseudonymization | Replaces identifiers with tokens, but a key can re-link them | Yes — with the key, so it is still personal data |
| Anonymization | Irreversibly strips identifying detail so no individual can be re-identified | No — if truly anonymized, it falls outside most privacy law |
Watch out
True anonymization is hard. 'Anonymized' datasets have been re-identified by combining them with other public data. Removing the name column is not anonymization — quasi-identifiers can still point to a person.
Two laws every practitioner should know
GDPR is the European Union's General Data Protection Regulation (adopted 2016; applicable from May 2018). It applies to anyone processing the personal data of people in the EU, even from abroad, and carries large penalties. India's Digital Personal Data Protection Act, 2023 — the DPDP Act — is India's national law governing personal data, built around consent and clear duties for organizations.
DPDP Act — India's Digital Personal Data Protection Act, 2023, governing how organizations (data fiduciaries) handle the personal data of individuals (data principals).
GDPR — The EU's General Data Protection Regulation (adopted 2016; applicable from 2018), a comprehensive data-protection law with extraterritorial reach and significant fines.
Shared principles
- Lawful basis / consent — collect and use personal data only with a valid basis, typically the individual's clear, informed consent.
- Purpose limitation — use data only for the purpose it was collected for; reusing customer data to train an unrelated model can breach this.
- Data minimization — collect only what you actually need, not everything you might one day want.
- Individual rights — people can access, correct, and request deletion of their data.
- Accountability and security — organizations must protect data and be able to demonstrate compliance.
Data governance is the set of policies, roles, and processes that keep data accurate, secure, and lawfully used across its life. It turns these principles into day-to-day practice.
Example — Purpose limitation in practice
A company collected phone numbers to deliver order updates. Later a team wants to use those numbers to train a marketing-response model. Under both GDPR and the DPDP Act, that is a new purpose the customer never consented to — it needs a fresh lawful basis, not a quiet repurposing.
Watch out
Not legal advice: laws evolve and details vary by jurisdiction and use case. When personal data is involved, involve your privacy, legal, or DPO function early rather than deciding alone.
Knowledge check
Quick practice — not part of your exam score.
Which statement about PII is most accurate?
A dataset has had direct identifiers replaced with tokens, but the team keeps a key that can re-link tokens to people. This data is best described as:
A company collected email addresses solely to send shipping notifications, then reuses them to train a promotional-targeting model without new consent. Which core data-protection principle does this most directly violate?
Sign in to track your progress and mark lessons complete.
Sign in