EU AI Act compliance is not a document you write once. It is a set of technical controls that must run on your AI systems continuously and a body of evidence you must be able to produce on demand. The Regulation, Regulation (EU) 2024/1689, tells you what the system must do; it does not tell you how to build the controls or where to keep the records. That gap is where most teams will lose time, and where the wrong architectural choice creates a fresh compliance problem while solving the first one.
This guide is a technical checklist, not a legal explainer. It maps the three articles that carry the heaviest engineering burden for a high-risk AI system, Article 9 (risk management), Article 12 (record-keeping), and Article 14 (human oversight), to the specific control you have to run and the exact evidence artefact an auditor will ask for. It then makes the architectural argument that every compliance-SaaS checklist avoids: the records you are required to keep are themselves sensitive data about regulated decisions, so generating and storing them on infrastructure you control is not a preference, it is part of the control.
If you need the upstream framework that places these articles inside a NIST and ISO control structure, that is a separate piece of work covered in the AI governance framework guide and the ISO 42001 management system guide. This guide assumes you have a system in scope and need to know what to build.
What EU AI Act compliance actually requires of an engineer
Start with the answer. For a high-risk AI system under the Act, the obligations that translate into code, configuration, and retained data are these:
- Run a continuous risk management process (Article 9) and keep its outputs as living records, not a one-time assessment.
- Emit automatic event logs over the system's lifetime (Article 12) and retain them for at least six months (Article 19).
- Build human oversight into the system (Article 14) so a person can interpret, override, and stop it, and prove they can.
- Maintain an Annex IV technical file (Article 11) and instructions for use (Article 13) that tell a deployer how to read the logs.
- Be able to produce all of the above for a market surveillance authority without exporting it to a third party first.
Everything else on a compliance checklist (classification, conformity assessment, EU database registration) is paperwork that points at these controls. The controls are the hard part because they have to operate on live traffic and leave an evidence trail. The table below maps each obligation to the control and the artefact, and it is the spine of the rest of this guide.
| Article | Obligation in plain terms | Technical control you must run | Evidence artefact you must produce |
|---|---|---|---|
| Art 9 | Continuous risk management across the lifecycle | A risk register updated from live monitoring data, with testing gates before release | Versioned risk register plus test results, dated, per system version |
| Art 10 | Data governance for training and input data | Dataset lineage, quality and bias checks | Data provenance records and validation reports |
| Art 11 + Annex IV | Technical documentation kept current | A maintained technical file generated from the build | Annex IV file matching the deployed version |
| Art 12 | Automatic logging over the system lifetime | Synchronous event capture on every AI action, structured | Queryable, tamper-evident event log |
| Art 13 | Transparency and instructions for use | Documented log schema and how to interpret it | Deployer-facing instructions for reading the logs |
| Art 14 | Human oversight | Interpretable output, override and stop controls, access gating | Records of overrides, interventions and (for biometrics) dual verification |
| Art 19 | Log retention | Retention policy of at least six months on the event store | Logs retrievable for the full retention window |
| Art 26 | Deployer keeps logs under its control | Deployer-side custody of the generated logs | Logs held by the deployer for the retention period |
The rest of this guide works down that table, starting with the deadline that makes it urgent, then the three articles that carry the most engineering, then the architecture decision that determines whether your evidence helps you or hurts you.
When does EU AI Act compliance become enforceable?
The Act did not switch on all at once. It phases in, and the phase that matters for a high-risk system is 2 August 2026. Knowing the exact sequence matters because the prohibited-practice rules are already live, and a system you build today has to be compliant before the high-risk obligations bite, not after.
The Regulation entered into force on 1 August 2024 and applies in waves set out on the European Commission's regulatory framework page.
| Date | Milestone | What becomes enforceable |
|---|---|---|
| 1 Aug 2024 | Entry into force | The Regulation is law; clocks start for every later phase |
| 2 Feb 2025 | Prohibited practices and AI literacy | Article 5 bans (social scoring, certain biometric uses) apply; staff AI literacy obligations begin |
| 2 Aug 2025 | General-purpose AI models | Obligations for GPAI model providers, plus the governance and penalty provisions, apply |
| 2 Aug 2026 | Main applicability | Most high-risk obligations apply, including Articles 9, 12, 13, 14 and 15 for Annex III systems |
| 2 Dec 2027 | Later high-risk wave | Further high-risk categories enter application per the Commission timeline |
| 2 Aug 2028 | Product-embedded high-risk | AI that is a safety component of products under existing EU harmonisation law |
The practical reading: if you operate or deploy an Annex III high-risk system into the EU, the controls in this guide must be running by 2 August 2026. The prohibited practices in Article 5 are already enforceable, so the first item on any checklist is confirming you are not running one of them. The penalties section at the end of this guide explains why the deadline gets board attention.
Which systems are actually high-risk?
Most of the Act's weight falls on high-risk systems, so the first technical decision is whether yours is one. Getting this wrong in either direction is expensive: over-scope and you build conformity machinery you do not need; under-scope and you miss the 2 August 2026 obligations entirely.
Article 6 sets two pathways. A system is high-risk if it is a safety component of a product already covered by EU harmonisation legislation (Annex I), or if it falls into one of the use cases listed in Annex III. The Annex III areas are the ones most software teams hit:
- Biometric identification and categorisation
- Critical infrastructure management and operation
- Education and vocational training (for example, scoring exams or admissions)
- Employment, worker management and access to self-employment (CV filtering, promotion decisions)
- Access to essential private and public services (credit scoring, benefits eligibility, insurance pricing)
- Law enforcement
- Migration, asylum and border control
- Administration of justice and democratic processes
There is a narrow off-ramp. Article 6(3) lets you treat an Annex III system as not high-risk if it does not pose a significant risk of harm, for example because it performs a narrow procedural task or only improves the result of completed human work. The catch: a system that performs profiling of natural persons is always high-risk, and if you claim the derogation you must document the assessment before deployment and produce it on request. That documented assessment is itself an evidence artefact. If you run autonomous agents that touch any of these domains, treat them as in scope until a documented assessment says otherwise; the related risk surface is covered in the shadow AI governance guide.
Article 9: turn risk management into living evidence
Article 9 is where teams that are good at shipping but new to regulated compliance get caught, because it forbids the thing engineers naturally do: a one-time risk assessment at launch.
Article 9 requires a risk management system that is "a continuous iterative process planned and run throughout the entire lifecycle of the high-risk AI system, requiring regular systematic review and updating." That single sentence has two engineering consequences. First, the risk register is a living document tied to the system version, not a slide deck from the kickoff. Second, it has to be fed by real operational data, because the Article requires you to evaluate risks that emerge from post-market monitoring, which only exists once the system is running.
The four-step cycle the Article describes maps cleanly onto controls you can run:
- Identify and analyse known and reasonably foreseeable risks to health, safety, and fundamental rights from the system's intended use. This is your initial risk register.
- Estimate and evaluate risks from intended use and from reasonably foreseeable misuse. Misuse matters: a CV-screening model used outside its intended population is a foreseeable misuse you must consider.
- Evaluate risks from post-market monitoring data. This is the feedback loop. Your monitoring has to surface new risks back into the register.
- Adopt targeted risk management measures so residual risk is judged acceptable, prioritising elimination or reduction by design where technically feasible.
The Article also requires testing to find the most appropriate measures and to confirm consistent performance, carried out throughout development and, critically, "prior to their being placed on the market or put into service" against predefined metrics. In practice that means a release gate: a system version does not ship until its risk tests pass and the results are recorded against that version.
The evidence artefact Article 9 produces is a versioned risk register plus dated test results, one set per deployed version, with a visible link from a monitoring signal to a register update to a mitigation. An auditor will look for that chain. If your risk register has not changed since launch but your monitoring has flagged incidents, that is a finding. The general method for building the register (threat identification, scoring, treatment) is the same discipline covered in the AI risk management guide; what Article 9 adds is the legal requirement that it never stops.
Article 12: record-keeping you can produce on demand
Article 12 is the obligation most likely to be failed quietly, because standard application logging looks like it satisfies it and does not. This is the heart of the technical checklist, so it gets the most space.
Article 12 states that "high-risk AI systems shall technically allow for the automatic recording of events (logs) over the lifetime of the system." Two words carry the weight. Automatic means the system generates the logs itself; a manual incident report or a human-written note does not count. Over the lifetime means from first deployment to retirement, not a rolling window you can lose.
The logs must enable three things, per Article 12(2):
- (a) identifying situations that may result in the system presenting a risk under Article 79(1) or in a substantial modification;
- (b) facilitating post-market monitoring under Article 72;
- (c) monitoring the operation of the high-risk system.
For remote biometric identification systems (Annex III point 1(a)), Article 12(3) is prescriptive about the minimum fields: the period of each use (start and end date and time), the reference database against which input data was checked, the input data for which the search led to a match, and the identification of the natural persons involved in verifying the results.
Crucially, the Act does not prescribe a log format or schema for the general case. That sounds like freedom; it is actually a trap, because the European Commission AI Act Service Desk and the text leave you to prove your logs are adequate. The practical standard the regulation implies is structured, queryable, and tamper-evident records, because a market surveillance authority that asks for the events around an incident will not accept free-text application logs scattered across services. Standard logging fails three ways: it is unstructured, it is not retained long enough by default, and it is mutable, so it cannot demonstrate that a record was not altered after the fact.
Design the log record deliberately. The fields below are the minimum that lets a single event answer the three Article 12(2) questions and survive an audit.
| Log field | Why the Act needs it | Source obligation |
|---|---|---|
| Event timestamp (start and end) | Reconstruct the period of operation and each use | Art 12(2)(c), 12(3)(a) |
| System and model version | Tie the event to the risk-assessed version | Art 9, Art 12(2)(a) |
| Acting identity (user or agent) | Attribute the action; support human-oversight records | Art 14, Art 26 |
| Input reference (not raw sensitive data) | Trace what the decision acted on without over-retaining | Art 10, Art 12(3)(c) |
| Output or decision | Identify risk situations and substantial modifications | Art 12(2)(a) |
| Human-oversight action (override, confirm, stop) | Prove a person could and did intervene | Art 14(4) |
| Outcome status (allowed, blocked, flagged) | Support operational and post-market monitoring | Art 12(2)(b), 12(2)(c) |
| Integrity marker (hash or signature chain) | Demonstrate the record was not altered after the fact | Art 12(1) (automatic, reliable) |
On retention, Article 19 is specific: providers must keep the automatically generated logs "for a period appropriate to the intended purpose of the high-risk AI system, of at least six months, unless provided otherwise in the applicable Union or national law." Six months is a floor, not a target. Financial-services and other regulated sectors will have longer statutory retention that overrides it, and data-protection law may require you to retain less of the raw personal data while keeping the event record. The deployer-side duty in Article 26 means the organisation operating the system keeps these logs under its own control for the retention window, which is the legal seam that makes the architecture in the next section a compliance question rather than an IT preference.
The capture has to be synchronous and on the request path. A log written asynchronously after the action can be lost if the process dies between action and write, and a gap in a lifetime log is exactly the situation Article 12(2)(a) exists to prevent. Emit the structured event as part of the same transaction that executes the AI action, hash-chain it so any later edit is detectable, and ship it to a store that enforces the retention policy.
Article 14: human oversight you can prove
Article 14 is deceptively soft. "A human in the loop" sounds like a policy, but the Article specifies capabilities the system must give that human, and each one is a control you build and an event you log.
Article 14 requires high-risk systems to be designed, including with appropriate human-machine interface tools, so they "can be effectively overseen by natural persons during the period in which they are in use." Effective is the operative word: a confirm button nobody can act on intelligently is not oversight. Article 14(4) lists what the person must be able to do, and these convert directly into product requirements:
- (a) understand the system's capacities and limitations and monitor its operation, including detecting anomalies;
- (b) stay aware of automation bias, the tendency to over-trust system output, especially in decision-support settings;
- (c) correctly interpret the output, using the interpretation tools the system provides;
- (d) decide not to use the system or to disregard, override, or reverse its output;
- (e) intervene in operation or stop the system through a halt or emergency-stop mechanism.
Read as an engineering spec, that is four controls. An interpretability surface so the human can understand and correctly read the output (a and c). An override path that lets a person reverse or ignore a recommendation (d). A stop mechanism that halts the system (e). And access gating so only competent, authorised people can do any of this, which is role-based access control applied to the oversight functions themselves. Every one of these produces an evidence artefact: the override is logged, the stop is logged, the access grant is logged. Without those records you have designed oversight but cannot prove it operated, and unprovable oversight is a finding.
For remote biometric identification, Article 14(5) adds a hard requirement most teams miss: no action or decision may be taken on the basis of an identification unless it has been "separately verified and confirmed by at least two natural persons with the necessary competence, training and authority." This is a four-eyes control with a narrow law-enforcement and border-control derogation. If you run biometric identification, your system must enforce that two distinct, authorised identities confirm a match before any downstream action, and the log record must capture both confirmations. That is precisely the dual-confirmation field in the log-record asset above.
The pattern across Article 14 is that oversight is not a meeting, it is a set of runtime controls with access gating and a complete record of who intervened, when, and how.
Own your evidence: the architecture decision the checklists skip
Here is the point every compliance-SaaS checklist on the first page of search results avoids, because their business model depends on you not thinking about it. The evidence the Act requires you to generate is itself sensitive, regulated data.
Look at what Article 12 makes you log: the inputs and outputs of decisions about who gets credit, who gets a job interview, who gets flagged by a biometric system, who gets a benefit. Those records are personal data about consequential decisions, and they are exactly the material a market surveillance authority, a data subject exercising access rights, or your own DPO will scrutinise. Now consider the dominant compliance pattern on offer: send your evidence to a vendor's cloud so the vendor can help you "prove" compliance. You have just exported the most sensitive byproduct of your regulated AI system to a third party, created a new data-processing relationship and a new exfiltration path, and made your ability to answer a regulator dependent on a vendor's availability and data-residency posture. You solved a logging requirement by creating a data-governance problem under the same Regulation.
The deployer custody duty in Article 26 reinforces this: the organisation operating the system is responsible for keeping the generated logs under its control. "Under its control" is hard to argue when the logs live in a vendor tenant you do not own. The clean architecture is the one where the evidence never leaves your perimeter:
- Generate logs at the point of action, synchronously, inside your own infrastructure, so there is no window where a record exists only in transit to a third party.
- Store them in a system you operate, with the retention policy enforced locally, so a six-month (or longer) retrieval never depends on a vendor relationship.
- Make them auditable in place, exporting a scoped, hashed extract to an auditor on request rather than continuously streaming everything outward.
- Enforce the oversight and access controls at the protocol layer, so every client, model, and agent inherits the same logging and the same gates without per-application reimplementation.
This is the same self-hosted, perimeter-bounded design argued in depth in the self-hosted AI governance guide, applied to the specific evidence the EU AI Act demands. The principle is simple: the records that prove you are compliant should not be the records that make you non-compliant somewhere else. An air-gap-capable, single-deployment governance layer that emits structured, tamper-evident logs into your own store satisfies Articles 12, 14, 19, and 26 without ever shipping a regulated decision to someone else's cloud.
A market surveillance authority under Article 74 can demand access to documentation and logs. When that happens, the question is not whether you have a compliance dashboard. It is whether you can produce the events around a specific incident, intact and provably unaltered, from a store you control. Build for that question.
An EU AI Act compliance checklist you can action
The articles above are the load-bearing controls. This section turns them into an ordered checklist you can run against a system, because the sequence matters: classification gates everything, and several items have to exist before others can be true. Work it top to bottom, per system, and keep the output of each step as evidence.
1. Inventory every AI system and agent in scope. You cannot govern what you have not listed. Build a register of every model, agent, and AI-assisted decision path that touches an EU user or operates from the EU, including the autonomous agents and third-party tools that teams stand up without telling security. An unknown system is an unassessed system, and an unassessed system is a finding waiting to happen.
2. Classify each system against Articles 5 and 6. For each entry, decide: is it a prohibited practice under Article 5 (stop immediately, it has been banned since 2 February 2025), high-risk under Article 6 and Annex III, subject only to transparency obligations under Article 50 (for example a chatbot or a generator of synthetic media), or minimal risk. Record the rationale. If you claim the Article 6(3) derogation for an Annex III use case, write the assessment down before deployment, because the Act requires you to produce it on request.
3. Stand up the Article 9 risk management process. For every high-risk system, create the versioned risk register, define the pre-release test gates, and wire post-market monitoring back into the register. The deliverable is not a document, it is a running loop with dated outputs per version.
4. Establish data governance under Article 10. Record the provenance of training, validation, and test data, run the bias and quality checks the Article expects, and document the results. This is also where you decide what input data you may lawfully retain in the logs versus what you must reference indirectly to stay inside data-protection limits.
5. Build the Annex IV technical file (Article 11). The technical documentation has to describe the system, its intended purpose, its risk management, and its logging, and it has to match the version actually deployed. Generate it from the build where you can, so it does not drift away from reality the moment the system changes.
6. Implement Article 12 logging and Article 19 retention. Stand up the structured, tamper-evident event log using the field design earlier in this guide, emit it synchronously on the request path, and enforce a retention policy of at least six months (longer where sector law requires). Confirm you can query the log for the events around an arbitrary incident, because that is what an authority will ask for.
7. Write Article 13 instructions for use. Document the log schema and how a deployer reads and interprets it. Article 13 turns your internal logging into something a downstream operator can actually use for their own oversight, and it is the bridge between provider and deployer duties.
8. Implement Article 14 human oversight. Ship the interpretability surface, the override path, the stop mechanism, and the access gating, and make sure each produces a log record. For biometric identification, enforce and log the two-person verification before any action.
9. Complete conformity assessment and registration. Depending on the system, this is either an internal conformity assessment or one involving a notified body, followed by the EU declaration of conformity, the CE marking where applicable, and registration in the EU database for high-risk systems. These are paperwork, but they point back at the controls in steps 3 through 8, so they are only as credible as the evidence underneath them.
10. Assign deployer custody under Article 26. Confirm that the organisation operating each system holds its logs under its own control for the retention period and knows how to respond to an Article 74 request from a market surveillance authority. This is the step that fails silently if your evidence lives in a third-party cloud, which is why the architecture in the previous section is part of the checklist, not separate from it.
Run this list per system and the abstract obligation of "EU AI Act compliance" becomes a finite, trackable backlog with an evidence artefact at the end of every item.
The penalties that make this a board issue
The reason EU AI Act compliance gets budget is Article 99, and the numbers are large enough that the cost of building the controls is rounding error against the exposure.
Article 99 sets three tiers of administrative fines:
- Prohibited practices (Article 5 breaches): up to 35 million euros or 7 percent of total worldwide annual turnover, whichever is higher.
- Non-compliance with most high-risk obligations (including the provider and deployer duties around the controls in this guide): up to 15 million euros or 3 percent of total worldwide annual turnover, whichever is higher.
- Supplying incorrect, incomplete or misleading information to notified bodies or authorities: up to 7.5 million euros or 1 percent of turnover.
For SMEs and start-ups, the Article applies the lower of the fixed amount and the percentage, which softens the absolute figures but does not remove the obligation. The middle tier is the one most enterprises should plan against, because failing to keep adequate Article 12 logs or to implement Article 14 oversight sits squarely in the high-risk obligations band. A 3 percent-of-global-turnover exposure for a logging gap is what turns this from an engineering backlog item into a board-level risk.
The point is not fear. It is proportionality: the controls in this guide are a few weeks of focused engineering for a system you already operate, and they retire a liability measured in percentages of global revenue. That is the trade an auditor, a CFO, and a CISO will all recognise.
How the EU AI Act fits your wider governance work
The EU AI Act is binding law, but it is not the only framework you will be asked about, and you should not build a separate compliance silo for it. The NIST AI Risk Management Framework gives you the risk vocabulary and the control structure that Article 9 expects, and ISO/IEC 42001 gives you the certifiable management system that holds the whole programme together. The crosswalk between those standards and the EU AI Act is covered in those guides; the value of mapping them is that one control structure answers several questions at once, so the Article 12 log you build for the EU AI Act is the same audit evidence your ISO 42001 assessor and your SIEM both want.
Treat the EU AI Act as the binding floor and the standards as the operating system, and the work compounds instead of duplicating.
Conclusion
Build the controls before the deadline, not the documentation after it. Classify every system against Article 6 and Annex III, then for each high-risk one stand up the three controls that carry the weight: a continuous Article 9 risk register fed by live monitoring, an automatic Article 12 event log that is structured, tamper-evident, and retained for at least six months, and the Article 14 oversight functions with their override, stop, and access records. Keep all of it inside your own perimeter so your evidence never becomes someone else's breach. For the surrounding structure, map this work onto your AI governance framework and ISO 42001 programme rather than running it alone.