Data Residency & Sovereignty
Why Data Location Matters
Ten years ago, most companies stored everything in one region and nobody asked questions. That era is over. GDPR restricted EU personal data transfers to countries without an "adequacy decision." The Schrems II ruling killed Privacy Shield, the main legal mechanism US companies relied on. China's PIPL mandates local storage with government approval for cross-border transfers. India's DPDP Act, Russia's Federal Law 242-FZ, Brazil's LGPD, and Saudi Arabia's PDPL all impose their own data localization requirements.
If your users are global, their data is subject to the laws of wherever they are, and sometimes wherever the data is stored, and sometimes both. The engineering problem is building systems that respect these boundaries without turning your architecture into an unmanageable mess.
Regional Deployment Patterns
The cleanest approach is to design for multi-region from the start. Retrofitting data residency onto a single-region system is painful.
Geo-routed ingestion - Use DNS-based or load balancer-based geo-routing to send users to the nearest regional deployment. EU users hit eu-west-1, APAC users hit ap-southeast-1. Each region has its own database, application layer, and storage. Data does not cross regions unless explicitly required.
Regional database replicas - If you need a global read model but must keep writes local, use regional primary databases with read replicas. The primary in eu-west-1 handles EU writes; the primary in us-east-1 handles US writes. Cross-region replication only carries non-personal data or pseudonymized data.
Shared control plane, regional data plane - Keep your management layer (authentication, billing, admin dashboards) centralized, but keep user-generated content and PII in regional data planes. This is the pattern most SaaS companies converge on because it balances operational simplicity with compliance.
Data Classification for Residency
Localizing everything is expensive and operationally complex. The smarter approach is classification:
- Strictly regulated data - PII, PHI, financial records, government IDs. Must stay in the jurisdiction the regulation specifies. Apply encryption, access controls, and regional storage.
- Business data - Revenue figures, product analytics, aggregated metrics. Usually not subject to residency rules. Can be centralized for reporting.
- Public data - Marketing content, documentation, open-source code. No residency constraints. Serve from the nearest CDN edge.
Build this classification into your data pipeline. Tag data at ingestion, route based on tags. If a record contains PII, it goes to the regional store. If it is an aggregated metric, it goes to the central warehouse.
Cloud Provider Capabilities
AWS - 33 regions globally. Use region-specific S3 buckets and RDS instances. Be aware that some services (like IAM, Route 53, CloudFront) are global by nature. S3 bucket policies can enforce region restrictions. AWS GovCloud provides isolated regions for US government workloads. No equivalent sovereign cloud for EU yet, but there is talk of "European Sovereign Cloud" coming.
GCP - 40 regions. Organization policies can restrict resource locations at the org or project level (constraints/gcp.resourceLocations). This is a hard constraint. Once set, no one can create resources outside the allowed regions, even by accident.
Azure - 60+ regions including sovereign clouds for US Government, China (21Vianet), and Germany. Azure Policy can enforce resource location constraints. The sovereign clouds are physically and logically isolated from the commercial cloud.
For all three: check that managed services you depend on actually keep data in the selected region. Some services process data in other regions for machine learning features, telemetry, or indexing. Read the fine print in the data processing addendum.
Cross-Border Transfer Mechanisms
When you do need to move data across jurisdictions, you need a legal basis:
Standard Contractual Clauses (SCCs) - The main mechanism for EU-to-non-EU transfers after Schrems II. These are pre-approved contract templates from the European Commission. You sign them with your data importer and supplement them with a Transfer Impact Assessment (TIA) that evaluates whether the destination country's surveillance laws undermine the protections.
Binding Corporate Rules (BCRs) - For intra-company transfers within a multinational. Expensive to set up (takes 1-2 years for approval) but covers all group entities once approved. Only makes sense for large organizations with significant intra-company data flows.
Adequacy decisions - The European Commission has decided that some countries provide "adequate" data protection. If your data goes to Japan, South Korea, the UK, Canada, or a few others, no additional mechanism is needed. The US now has an EU-US Data Privacy Framework (replacing Privacy Shield), but its long-term survival is uncertain.
Derogations - Explicit consent, contractual necessity, and public interest can justify transfers in specific cases, but these are narrow exceptions, not broad strategies. Do not rely on consent as your primary transfer mechanism for ongoing data flows.
Architecture Pitfalls
These are the ways data residency breaks in practice:
CDN edge caches - CloudFront, Cloudflare, and Akamai cache content at edge locations worldwide by default. If a response contains personal data and gets cached, that data is now in 200+ locations. Restrict edge caching to static assets, or use geo-restriction policies to limit which edge locations serve content.
Third-party analytics and logging - Segment, Mixpanel, Datadog, and Sentry all process data on their infrastructure, which is often in the US. If you send events containing user IPs, emails, or session data to these services, you have a cross-border transfer. Either use their EU data residency options (not all offer this) or strip PII before sending.
Backup replication - Automated cross-region backup replication is great for disaster recovery but creates copies of personal data in other jurisdictions. Configure backup regions deliberately. If your primary is in eu-west-1, replicate to eu-central-1, not to us-east-1.
DNS resolution - Some DNS providers log query data (including querier IP addresses) in centralized systems. If you use a US-based DNS provider, EU user IP addresses are flowing to US infrastructure on every page load.
Error tracking - Stack traces and error payloads often contain user data. If your error tracking service is US-based and receives a stack trace with a European user's email address in a variable, that is a cross-border transfer.
Key Points
- •Data residency is about where data is physically stored; data sovereignty is about which country's laws govern that data. They overlap but are not the same thing
- •After Schrems II, transferring EU personal data to the US requires Standard Contractual Clauses plus a Transfer Impact Assessment. Privacy Shield is dead
- •China's PIPL requires personal data of Chinese citizens to be stored in China, with a security assessment required before any cross-border transfer
- •Not all data needs localization. Classify data by sensitivity and apply residency rules only where regulations actually require it
- •CDN edge caches, analytics tools, and backup replication are the three most common ways data accidentally crosses borders
Common Mistakes
- ✗Assuming that using an EU AWS region means all data stays in the EU. Control plane metadata, DNS queries, and some managed service telemetry may still route through US endpoints
- ✗Forgetting that CDN edge caches replicate content globally by default, which can put personal data in jurisdictions you did not intend
- ✗Using a US-headquartered analytics or logging provider without evaluating whether personal data flows to their US infrastructure
- ✗Building a single-region architecture and trying to bolt on data residency later. Retrofitting geo-partitioned data is an order of magnitude harder than designing for it upfront