EU AI Act Compliance

What This Means for Your Sprint

Your PM walks into standup and says the EU AI Act applies to your product. Before you start estimating a six-month compliance project, take a breath. The first question is simple: what risk tier does your system fall into? That answer determines whether you need to change anything at all or whether you're looking at significant engineering work.

Unacceptable risk systems are banned outright. Social scoring by governments, real-time remote biometric identification in public spaces (with narrow exceptions), and AI designed to manipulate vulnerable groups. If you're building these, compliance isn't the right word. You stop.

High-risk is where the engineering work lives. Annex III lists eight domains: biometrics, critical infrastructure, education, employment, essential services, law enforcement, migration, and justice. If your AI system makes or materially influences decisions in these areas, you trigger the full requirements. A recruitment screening tool that filters resumes? High-risk. A credit scoring model? High-risk. A medical diagnostic assistant? High-risk.

Limited risk means transparency obligations. Your chatbot needs to tell users they're talking to AI. Deepfake content needs labels. Emotion recognition systems need disclosure. These are design changes, not architectural ones.

Minimal risk has no mandatory requirements. Spam filters, game AI, inventory optimization, content recommendations on a music app. Most consumer ML falls here, and voluntary codes of conduct are the only expectation.

Engineering Work by Risk Tier

For minimal and limited risk systems, the work is small. Add disclosure UI, update your terms of service, document what you already have. A few sprints at most.

For high-risk systems, you're building compliance infrastructure. Here's what the regulation actually requires, translated into engineering tasks.

Model registry with lineage tracking. Every model in production needs a record of its training data, hyperparameters, evaluation metrics, and known limitations. MLflow handles this well if you instrument training pipelines to log metadata automatically. Weights & Biases provides richer experiment tracking with built-in model cards. DVC works for data versioning. The key is capturing this at training time, not reconstructing it months later. Spotify's ML platform team built model cards into their training pipeline so compliance documentation is a byproduct of the normal workflow, not a separate process.

Bias testing in CI. The regulation requires that training data be "relevant, representative, and free from errors." In practice, this means automated bias testing that runs on every model update. Fairlearn (Microsoft's open-source toolkit) can check for demographic parity, equalized odds, and calibration across protected groups. Google's What-If Tool provides interactive exploration for teams that want visual analysis before they automate. Wire this into your CI pipeline so a model can't reach production without passing bias checks.

Human-in-the-loop review. High-risk systems need mechanisms for humans to understand, monitor, and override system outputs. This isn't just a dashboard. It means confidence thresholds below which a human must review the output before it takes effect. For a recruitment screening tool, that might mean any candidate scored below the 30th percentile gets flagged for manual review. For a credit scoring model, it might mean decisions near the approval boundary require a human sign-off. Build the review queue, the override mechanism, and the audit log that records every human intervention.

Continuous monitoring. The regulation requires ongoing risk management, not a one-time assessment. Set up drift detection for your model's input distributions and output patterns. Prometheus can track prediction distributions over time. If your credit scoring model suddenly starts rejecting 40% more applications from a specific postal code, you need to catch that before a regulator does.

Automated documentation generation. Conformity assessments require detailed technical documentation: model architecture, training methodology, performance metrics, known limitations, intended use cases. If this is a Google Doc someone updates quarterly, it will be wrong. Generate it from your ML pipeline metadata. MLflow's model registry can export most of what you need. Build a CI job that regenerates documentation artifacts on every model version bump.

Real Tooling That Maps to Requirements

Requirement	Tools	What They Solve
Model documentation	MLflow Model Registry, W&B Model Cards	Training metadata, lineage, versioning
Bias testing	Fairlearn, Google What-If Tool, Aequitas	Demographic parity, calibration checks
Data versioning	DVC, LakeFS	Training data reproducibility
Compliance testing	AI Verify (Singapore IMDA), ALTAI	Self-assessment against regulatory criteria
Drift detection	Evidently AI, NannyML	Input/output distribution monitoring
Human oversight	Custom review queues, Labelbox	Override mechanisms, audit trails

Singapore's AI Verify toolkit deserves special mention. It's open-source, provides a structured testing framework, and maps well to the EU AI Act's requirements even though it was built for Singapore's governance framework. Several European companies are using it as a starting point for their conformity assessments.

The Timeline Is Tighter Than You Think

Banned practices took effect February 2025. That's already in the past.

GPAI (general-purpose AI) model obligations apply from August 2025. If your company provides foundation models or fine-tuned models to downstream users, you need technical documentation, copyright compliance processes, and training data summaries.

Full high-risk system requirements hit August 2026. That's roughly 18 months from now. If your AI system falls in Annex III and you haven't started building compliance infrastructure, the math gets uncomfortable. Model registries, bias testing pipelines, human override mechanisms, and documentation automation don't ship in a single quarter.

The practical advice: do the risk classification now. If you're minimal or limited risk, relax and make the small changes needed. If you're high-risk, start instrumenting your ML pipeline for compliance metadata this quarter. The engineering work is tractable if you start early. It becomes a crisis if you wait.

What This Means for Your Sprint

Engineering Work by Risk Tier

For minimal and limited risk systems, the work is small. Add disclosure UI, update your terms of service, document what you already have. A few sprints at most.

For high-risk systems, you're building compliance infrastructure. Here's what the regulation actually requires, translated into engineering tasks.

Real Tooling That Maps to Requirements

Requirement	Tools	What They Solve
Model documentation	MLflow Model Registry, W&B Model Cards	Training metadata, lineage, versioning
Bias testing	Fairlearn, Google What-If Tool, Aequitas	Demographic parity, calibration checks
Data versioning	DVC, LakeFS	Training data reproducibility
Compliance testing	AI Verify (Singapore IMDA), ALTAI	Self-assessment against regulatory criteria
Drift detection	Evidently AI, NannyML	Input/output distribution monitoring
Human oversight	Custom review queues, Labelbox	Override mechanisms, audit trails

The Timeline Is Tighter Than You Think

Banned practices took effect February 2025. That's already in the past.

What This Means for Your Sprint

Engineering Work by Risk Tier

Real Tooling That Maps to Requirements

The Timeline Is Tighter Than You Think

Key Points

Common Mistakes

Related Topics

EU AI Act Compliance

What This Means for Your Sprint

Engineering Work by Risk Tier

Real Tooling That Maps to Requirements

The Timeline Is Tighter Than You Think

Key Points

Common Mistakes

Related Topics