Privacy by Design
The Seven Foundational Principles
Privacy by Design, formalized by Ann Cavoukian and now embedded in GDPR Article 25, rests on seven principles that translate directly to engineering decisions:
- Proactive, not reactive - Anticipate privacy risks during design, not after incidents. Include privacy as a non-functional requirement in every technical spec.
- Privacy as the default - Systems should protect personal data automatically. Users should not have to take action to protect their privacy. Default sharing settings should be off, not on.
- Privacy embedded into design - Privacy is not an add-on feature. It is a core architecture constraint, like security or scalability.
- Full functionality (positive-sum) - Privacy and functionality are not trade-offs. Design systems where you can have both, using techniques like differential privacy, on-device processing, and federated learning.
- End-to-end security - Data protection across the entire lifecycle: collection, processing, storage, sharing, and deletion.
- Visibility and transparency - Users and auditors can verify that privacy controls are working as intended.
- Respect for user privacy - User-centric design with granular controls, easy data export, and straightforward deletion.
Data Minimization in Practice
Data minimization is not about collecting less data for the sake of it. It is about reducing attack surface and compliance scope. Every field of personal data you collect is a field you must protect, a field that can be breached, and a field that must be findable for deletion requests.
Practical techniques:
- Collect at the point of need - Do not ask for a phone number at registration if you only need it for two-factor authentication. Collect it when the user enables 2FA.
- Aggregate early - If you need analytics on user behavior, aggregate events into anonymous counts at ingestion time rather than storing individual user actions.
- Ephemeral processing - Process PII in memory and discard it. If you need to verify an ID document, extract the verified status and discard the document image.
- Field-level encryption - For PII that must be stored, encrypt individual fields so that database access alone does not expose personal data.
Consent Architecture
Consent is the legal basis for most personal data processing. A well-designed consent system has three components:
- Consent collection - Granular, per-purpose consent requests that explain what data is collected, why, and how long it is retained. Presented at the point of data collection, not buried in terms of service.
- Consent ledger - An immutable record of every consent grant, withdrawal, and modification. Each entry includes the user ID, purpose, timestamp, policy version, and collection method (web form, API, verbal).
- Consent enforcement - An internal API that every service calls before processing personal data:
canProcess(userId, purpose) -> boolean. This centralizes consent checks and prevents data processing after consent withdrawal.
Anonymization vs. Pseudonymization
This distinction has real architectural consequences:
| Technique | Reversible? | GDPR Applies? | Example |
|---|---|---|---|
| Anonymization | No | No, data is no longer personal | k-anonymity, differential privacy, aggregation |
| Pseudonymization | Yes (with key/table) | Yes, data is still personal | Hashing, tokenization, encryption |
If you hash a user's email with SHA-256, that is pseudonymization. Anyone with the same email can produce the same hash. True anonymization requires that no reasonable effort can re-identify the individual. For analytics, consider techniques like differential privacy (adding calibrated noise to query results) or k-anonymity (ensuring every record is indistinguishable from at least k-1 others).
Privacy Impact Assessments
Every feature that introduces new personal data collection or changes how existing data is processed should go through a Privacy Impact Assessment (PIA). This does not have to be a heavyweight legal process. It can be a section in your design document that answers five questions:
- What personal data does this feature collect or process?
- What is the legal basis for processing (consent, legitimate interest, contract)?
- Who has access to this data, and why?
- How long is the data retained, and how is it deleted?
- What happens if this data is breached?
Treat these answers as design constraints and enforce them in code reviews.
Key Points
- •Privacy by Design means embedding privacy into system architecture from the start, not bolting it on after launch
- •Data minimization is the most powerful privacy control. Do not collect data you do not need
- •Purpose limitation requires that data collected for one purpose is not repurposed without explicit consent
- •Anonymization is irreversible and removes data from GDPR scope entirely; pseudonymization is reversible and data remains in scope
- •Privacy impact assessments (PIAs) should be part of the design review process for any feature that handles personal data
Common Mistakes
- ✗Collecting all available user data 'just in case' it becomes useful later. This violates data minimization and creates unnecessary risk
- ✗Confusing pseudonymization with anonymization. Hashing an email is pseudonymization (reversible with a lookup table), not anonymization
- ✗Treating privacy as a legal team responsibility instead of an engineering design constraint
- ✗Building analytics pipelines on raw PII when aggregated or anonymized data would serve the same purpose