AI Act Readiness — Architecture & Compliance Map

Scope. This document describes how the rag-showcase reference architecture maps to the obligations of Regulation (EU) 2024/1689 (the "AI Act"). It serves two audiences: prospects evaluating whether the architecture would be defensible if productionised in the EU, and internal reviewers checking that we haven't over-claimed.

Status of the showcase itself. This deployment is a non-production demonstration for sales and evaluation purposes. It is excluded from the substantive obligations of the Act by Article 2(8) (research, testing and development prior to placing on the market or putting into service). The only obligation that applies regardless of stage is Art. 50 user-facing transparency, which we satisfy via the persistent "KI-generierte Antworten · Demo-System, nicht für den Produktiveinsatz" disclosure under the chat input.

What follows is therefore productionisation guidance: how a customer-side deployment of this architecture would line up against the Act, and which design choices in the codebase already do the work.

1 · Risk classification

The reference use case — retrieval-augmented Q&A over a tenant's own technical, service or product documentation, with literal-quote citations to the source — is limited risk under the Act:

Not in Annex III (no biometric ID, no education / employment / credit / migration / law-enforcement / democratic-process decisions, no critical-infrastructure control).
Not a GPAI provider obligation: we deploy Azure-hosted models, we do not train or place a general-purpose model on the market. Provider obligations (Art. 53–55) sit with Azure / OpenAI.
Not a prohibited practice under Art. 5 (no subliminal manipulation, no social scoring, no real-time biometric ID, no emotion inference in workplace/education, no scraping for facial recognition databases).

Caveat for deployers. If a deployer routes the assistant into a workflow that is high-risk in their own right — e.g. integrating its outputs into the safety chain of machinery under Regulation (EU) 2023/1230, or into product-safety decision-making — the use can inherit high-risk status. That is a deployer-side judgement, not a property of this architecture. The architecture supports either posture; the obligations below are written for the limited-risk deployment.

2 · Obligations map

Each row: the Act obligation, the design choice that addresses it, the file(s) where that choice lives, and any gap a productionising customer would still need to close themselves.

Art. 4 — AI literacy

Aspect	Status
Obligation	Providers and deployers must ensure that staff dealing with the system have a sufficient level of AI literacy.
Architecture support	None required from code.
Productionisation gap	One-pager covering: what RAG is, the system's intended use and limits, the difference between retrieval and generation, when to defer to the source PDF rather than the answer. To be drafted per-tenant during onboarding.

Art. 50(1) — Disclosure of AI interaction

Aspect	Status
Obligation	Natural persons interacting with an AI system must be informed that they are interacting with an AI system, unless it is obvious from the circumstances.
Architecture support	Persistent disclosure under the chat input: "KI-generierte Antworten · Demo-System, nicht für den Produktiveinsatz". Production deployments drop the "Demo-System" half and keep the "KI-generiert" line.
File(s)	`apps/web/src/components/ChatPanel.tsx` (`.ai-disclosure` block); `apps/web/src/app/showcase.css` (`.ai-disclosure` rule).
Productionisation gap	Swap copy via the tenant config (`apps/web/src/lib/tenants.ts`) if a tenant wants brand-specific wording.

Art. 50(2) — Marking of AI-generated output

Aspect	Status
Obligation	Output of an AI system generating synthetic content must be marked in a machine-readable format and detectable as artificially generated.
Architecture support	Each answer carries inline citations (`[1]`, `[2]` …) resolving to specific chunks of specific source documents, rendered as clickable references with a page-inspector modal. The literal-quote rule in the generator prompt means substantive claims are anchored to a verbatim span in the corpus. This is stronger than a watermark — it is provenance.
File(s)	`apps/web/src/app/api/answer/route.ts`, `apps/web/src/app/api/chat/route.ts` (prompt); `apps/web/src/components/MarkdownAnswer.tsx` (citation rendering); `apps/web/src/components/PageInspectorModal.tsx` (source view).
Productionisation gap	If a deployer redistributes generated text outside the chat UI (e.g. exports to PDF), they should attach a machine-readable marker (`<meta name="ai-generated" …>` or C2PA manifest) at the redistribution boundary.

Art. 9 / 13 / 15 — Risk management, transparency, accuracy (high-risk only, but architecture-relevant)

These articles bind only providers of high-risk systems. They are called out because customers in regulated industries will ask whether the architecture could support a high-risk posture without a rewrite. It can:

Article	What the architecture already gives you
Art. 13 — Transparency and provision of information to deployers	Per-chunk citation, page-bitmap inspector, source filename and page number on every claim. The retrieval payload is structured (zod-validated) and can be exported.
Art. 14 — Human oversight	Read-only assistant. Outputs do not actuate anything; the human is unavoidably in the loop. The page inspector is designed for verification, not just display.
Art. 15 — Accuracy, robustness, cybersecurity	Hybrid retrieval (HNSW + tsvector/GIN) + RRF + listwise rerank reduces single-strategy failure modes. Eval harness (`eval/`) measures retrieval recall and answer-fact recall per question, including bidirectional cross-tenant leak tests. Tenant isolation enforced at three layers (routing, DB extension, filesystem) — see Section 3.
Art. 10 — Data and data governance	The pipeline does not train on retrieved content. The only data flowing is the tenant's own corpus, ingested under the deployer's control. Per-tenant filesystem and DB partitioning are documented in `CLAUDE.md` under "Multi-tenant isolation, at a glance".
Art. 12 — Record-keeping (logging)	Gap. No interaction log is persisted today. See Section 4.

Art. 53–55 — General-Purpose AI (GPAI) flow-down

We do not place a GPAI model on the market; we use Azure-hosted models. The provider obligations sit with Azure (and upstream model authors). What we owe a customer is a clean inventory of which hosted models we depend on, so the customer can verify provider compliance against their own procurement standards.

Aspect	Status
Architecture support	Two single-source-of-truth model registries: `apps/web/src/lib/ai-models.ts` and `apps/parser/src/rag_parser/core/models.py`. Each entry names the Azure deployment, its surface (`openai-v1` or `models-inference`), and its role (generator / rerank / multiquery / embedding / VLM / contextualize / page-title).
Productionisation gap	Publish the registries as a model-inventory page (auto-generated from the two files). Include the upstream provider name and the link to their GPAI compliance statement.

3 · Multi-tenant isolation (defence-in-depth)

Single deployment, single Postgres, multiple tenants. Three layers, all funnelling through tenantId:

Routing. apps/web/src/middleware.ts matches the request Host against tenant.hosts[] and stamps x-tenant: <id>. In dev, ?tenant=<id> overrides. Server components read via getTenantFromHeaders(); client components receive a serialised prop from the root layout.
DB chokepoint. dbFor(tenantId) in apps/web/src/lib/db.ts returns a Prisma extension that auto-injects where.tenantId on reads, stamps data.tenantId on creates, and refuses update/delete/upsert whose where selector doesn't include tenantId (the composite unique keys (tenantId, contentHash) on Document and (documentId, chunkId) on Chunk make that natural). Raw SQL bypasses the extension by design; the only raw-SQL callers (apps/web/src/lib/search.ts) add the filter explicitly with parameter binding.
Filesystem. Every served asset is namespaced by tenant: public/figures/<tenantId>/<docId>/…, public/pages/<tenantId>/<docId>/…, corpus/<tenantId>/*.pdf.

Bidirectional leak detection lives in the eval suites: each tenant's fixture includes a few questions targeting other tenants' vocabulary that MUST refuse (e.g. crosstenant-sew-f11 in questions.porsche.json).

This is not a formal Act obligation at limited risk — but it is the single most-asked question in customer security review, and the architecture answers it cleanly.

4 · Gaps a productionising deployer would close

Listed honestly, in priority order, with rough effort.

Interaction logging (~½ day). Persist per-request: tenant, timestamp, query, retrieved chunk IDs, model versions used, answer text, citation set. Append-only, retention configurable per tenant. Not legally required at limited risk; required the moment a deployer asks "what did the system tell my engineer last Tuesday?" or a use case drifts toward high-risk.

Adjacent, not the same thing. The showcase already persists visit-level analytics (TenantVisit table, written by the /api/track beacon — tenant, timestamp, optional ?utm_recipient= name, opaque visitor cookie, UA/locale/screen fingerprint). It is the input to the password-gated admin dashboard used for sales analytics; the rag-debug=true cookie is a cookie-pinned opt-out that short-circuits the insert server-side. Visit logging is not AI-request logging — it records that a page loaded, not the question asked or the answer returned. A productionising deployer would either layer request-level logging on top (recommended for high-risk postures) or strip the visit beacon entirely (recommended for deployments where even page-load analytics are out of scope).
Model inventory page (~2 hours). Auto-render the two MODELS maps as a public /models page. Include provider name and GPAI compliance link per entry.
System / model card (~½ day). Intended use, limitations (see CLAUDE.md § "Known limitations"), languages supported (currently German end-to-end), corpus scope, evaluation results from eval/, escalation contact. Public.
Tenant-side AI literacy briefing (~2 hours per tenant). One-pager covering what RAG is, when to defer to the source PDF, and the system's intended use boundary.
Incident reporting hook (~½ day). A "report a wrong answer" button on each assistant message, writing to the interaction log with a flag. Becomes the input to the deployer's incident process.
Export-time output marking (future). If/when the assistant gains export-to-PDF or share-link functionality, attach a machine-readable AI-generated marker at that boundary.

None of these require architectural changes. The system is deliberately designed so the compliance surface is additive.

5 · What this document is not

It is not legal advice. It is an engineering self-assessment.
It is not a Conformity Assessment under Art. 43 (which applies to high-risk systems only).
It does not bind any customer to a particular risk classification of their downstream use. Deployers retain that determination.

6 · Maintenance

When any of the following change, update the relevant section:

Tenant config / routing logic → Section 3.
Model registries (ai-models.ts, models.py) → Section 2, Art. 53–55 row.
Generator prompt (literal-quote rule) → Section 2, Art. 50(2) row.
Eval coverage (eval/questions.*.json) → Section 2, Art. 15 row.
UI disclosure copy → Section 2, Art. 50(1) row.
Visit-tracking surface (/api/track, TenantVisit schema, rag-debug opt-out) → Section 4, gap #1.