EU AI Act Compliance
What This Means for Your Sprint
Your PM walks into standup and says the EU AI Act applies to your product. Before you start estimating a six-month compliance project, take a breath. The first question is simple: what risk tier does your system fall into? That answer determines whether you need to change anything at all or whether you're looking at significant engineering work.
Unacceptable risk systems are banned outright. Social scoring by governments, real-time remote biometric identification in public spaces (with narrow exceptions), and AI designed to manipulate vulnerable groups. If you're building these, compliance isn't the right word. You stop.
High-risk is where the engineering work lives. Annex III lists eight domains: biometrics, critical infrastructure, education, employment, essential services, law enforcement, migration, and justice. If your AI system makes or materially influences decisions in these areas, you trigger the full requirements. A recruitment screening tool that filters resumes? High-risk. A credit scoring model? High-risk. A medical diagnostic assistant? High-risk.
Limited risk means transparency obligations. Your chatbot needs to tell users they're talking to AI. Deepfake content needs labels. Emotion recognition systems need disclosure. These are design changes, not architectural ones.
Minimal risk has no mandatory requirements. Spam filters, game AI, inventory optimization, content recommendations on a music app. Most consumer ML falls here, and voluntary codes of conduct are the only expectation.
Engineering Work by Risk Tier
For minimal and limited risk systems, the work is small. Add disclosure UI, update your terms of service, document what you already have. A few sprints at most.
For high-risk systems, you're building compliance infrastructure. Here's what the regulation actually requires, translated into engineering tasks.
Model registry with lineage tracking. Every model in production needs a record of its training data, hyperparameters, evaluation metrics, and known limitations. MLflow handles this well if you instrument training pipelines to log metadata automatically. Weights & Biases provides richer experiment tracking with built-in model cards. DVC works for data versioning. The key is capturing this at training time, not reconstructing it months later. Spotify's ML platform team built model cards into their training pipeline so compliance documentation is a byproduct of the normal workflow, not a separate process.
Bias testing in CI. The regulation requires that training data be "relevant, representative, and free from errors." In practice, this means automated bias testing that runs on every model update. Fairlearn (Microsoft's open-source toolkit) can check for demographic parity, equalized odds, and calibration across protected groups. Google's What-If Tool provides interactive exploration for teams that want visual analysis before they automate. Wire this into your CI pipeline so a model can't reach production without passing bias checks.
Human-in-the-loop review. High-risk systems need mechanisms for humans to understand, monitor, and override system outputs. This isn't just a dashboard. It means confidence thresholds below which a human must review the output before it takes effect. For a recruitment screening tool, that might mean any candidate scored below the 30th percentile gets flagged for manual review. For a credit scoring model, it might mean decisions near the approval boundary require a human sign-off. Build the review queue, the override mechanism, and the audit log that records every human intervention.
Continuous monitoring. The regulation requires ongoing risk management, not a one-time assessment. Set up drift detection for your model's input distributions and output patterns. Prometheus can track prediction distributions over time. If your credit scoring model suddenly starts rejecting 40% more applications from a specific postal code, you need to catch that before a regulator does.
Automated documentation generation. Conformity assessments require detailed technical documentation: model architecture, training methodology, performance metrics, known limitations, intended use cases. If this is a Google Doc someone updates quarterly, it will be wrong. Generate it from your ML pipeline metadata. MLflow's model registry can export most of what you need. Build a CI job that regenerates documentation artifacts on every model version bump.
Real Tooling That Maps to Requirements
| Requirement | Tools | What They Solve |
|---|---|---|
| Model documentation | MLflow Model Registry, W&B Model Cards | Training metadata, lineage, versioning |
| Bias testing | Fairlearn, Google What-If Tool, Aequitas | Demographic parity, calibration checks |
| Data versioning | DVC, LakeFS | Training data reproducibility |
| Compliance testing | AI Verify (Singapore IMDA), ALTAI | Self-assessment against regulatory criteria |
| Drift detection | Evidently AI, NannyML | Input/output distribution monitoring |
| Human oversight | Custom review queues, Labelbox | Override mechanisms, audit trails |
Singapore's AI Verify toolkit deserves special mention. It's open-source, provides a structured testing framework, and maps well to the EU AI Act's requirements even though it was built for Singapore's governance framework. Several European companies are using it as a starting point for their conformity assessments.
The Timeline Is Tighter Than You Think
Banned practices took effect February 2025. That's already in the past.
GPAI (general-purpose AI) model obligations apply from August 2025. If your company provides foundation models or fine-tuned models to downstream users, you need technical documentation, copyright compliance processes, and training data summaries.
Full high-risk system requirements hit August 2026. That's roughly 18 months from now. If your AI system falls in Annex III and you haven't started building compliance infrastructure, the math gets uncomfortable. Model registries, bias testing pipelines, human override mechanisms, and documentation automation don't ship in a single quarter.
The practical advice: do the risk classification now. If you're minimal or limited risk, relax and make the small changes needed. If you're high-risk, start instrumenting your ML pipeline for compliance metadata this quarter. The engineering work is tractable if you start early. It becomes a crisis if you wait.
Key Points
- •Your PM just told you the EU AI Act applies to your recommendation engine. Before you panic, check Annex III. Most consumer ML features (search ranking, content recommendations, spam filters) land in minimal or limited risk. Recruitment tools, credit scoring, and medical diagnostics are where the heavy compliance hits
- •High-risk classification means you need infrastructure most ML teams don't have yet: model registries with lineage tracking, automated bias testing in CI, human-in-the-loop review for low-confidence predictions, and documentation generated from pipeline metadata rather than written by hand
- •MLflow, Weights & Biases, and DVC can handle most of the technical documentation requirements if you instrument them early. Retrofitting compliance onto a mature model is 5-10x harder than building it in from the start
- •Conformity assessments for most high-risk categories are self-assessed, but biometric systems require third-party audits. Either way, the documentation burden is substantial. Singapore's AI Verify toolkit provides a useful open-source testing framework even if you're not operating in Singapore
- •The enforcement timeline is tighter than it looks. Banned practices took effect February 2025. GPAI rules apply from August 2025. Full high-risk requirements hit August 2026. The architectural decisions your team makes this quarter determine how painful compliance will be
Common Mistakes
- ✗Assuming 'we only serve EU users through a US entity' exempts you. The AI Act applies to any AI system whose output is used in the EU, regardless of where the provider is based. If your model scores a loan application for an EU resident, you're in scope
- ✗Classifying your system as minimal risk without actually reading Annex III. The annex lists specific high-risk use cases across eight domains, and some are surprising. An AI tool that prioritizes job applications? High-risk. An AI tool that routes customer support tickets? Probably minimal
- ✗Treating compliance as a legal team problem. Legal can interpret the regulation, but engineers have to build the model registry, bias testing pipeline, human override mechanisms, and audit logging. If your engineering team first hears about this six months before the deadline, you're already behind
- ✗Building compliance artifacts manually after model training. If your model card is a Google Doc someone fills out quarterly, it will be incomplete and outdated. Generate documentation from your ML pipeline metadata automatically