Scope. This document describes how the
rag-showcasereference architecture maps to the obligations of Regulation (EU) 2024/1689 (the "AI Act"). It serves two audiences: prospects evaluating whether the architecture would be defensible if productionised in the EU, and internal reviewers checking that we haven't over-claimed.Status of the showcase itself. This deployment is a non-production demonstration for sales and evaluation purposes. It is excluded from the substantive obligations of the Act by Article 2(8) (research, testing and development prior to placing on the market or putting into service). The only obligation that applies regardless of stage is Art. 50 user-facing transparency, which we satisfy via the persistent "KI-generierte Antworten · Demo-System, nicht für den Produktiveinsatz" disclosure under the chat input.
What follows is therefore productionisation guidance: how a customer-side deployment of this architecture would line up against the Act, and which design choices in the codebase already do the work.
The reference use case — retrieval-augmented Q&A over a tenant's own technical, service or product documentation, with literal-quote citations to the source — is limited risk under the Act:
Caveat for deployers. If a deployer routes the assistant into a workflow that is high-risk in their own right — e.g. integrating its outputs into the safety chain of machinery under Regulation (EU) 2023/1230, or into product-safety decision-making — the use can inherit high-risk status. That is a deployer-side judgement, not a property of this architecture. The architecture supports either posture; the obligations below are written for the limited-risk deployment.
Each row: the Act obligation, the design choice that addresses it, the file(s) where that choice lives, and any gap a productionising customer would still need to close themselves.
These articles bind only providers of high-risk systems. They are called out because customers in regulated industries will ask whether the architecture could support a high-risk posture without a rewrite. It can:
We do not place a GPAI model on the market; we use Azure-hosted models. The provider obligations sit with Azure (and upstream model authors). What we owe a customer is a clean inventory of which hosted models we depend on, so the customer can verify provider compliance against their own procurement standards.
Single deployment, single Postgres, multiple tenants. Three layers,
all funnelling through tenantId:
apps/web/src/middleware.ts matches the request
Host against tenant.hosts[] and stamps x-tenant: <id>. In
dev, ?tenant=<id> overrides. Server components read via
getTenantFromHeaders(); client components receive a serialised
prop from the root layout.dbFor(tenantId) in apps/web/src/lib/db.ts
returns a Prisma extension that auto-injects where.tenantId on
reads, stamps data.tenantId on creates, and refuses
update/delete/upsert whose where selector doesn't include
tenantId (the composite unique keys (tenantId, contentHash) on
Document and (documentId, chunkId) on Chunk make that natural).
Raw SQL bypasses the extension by design; the only raw-SQL
callers (apps/web/src/lib/search.ts) add the filter explicitly
with parameter binding.public/figures/<tenantId>/<docId>/…,
public/pages/<tenantId>/<docId>/…,
corpus/<tenantId>/*.pdf.Bidirectional leak detection lives in the eval suites: each tenant's
fixture includes a few questions targeting other tenants'
vocabulary that MUST refuse (e.g. crosstenant-sew-f11 in
questions.porsche.json).
This is not a formal Act obligation at limited risk — but it is the single most-asked question in customer security review, and the architecture answers it cleanly.
Listed honestly, in priority order, with rough effort.
Interaction logging (~½ day). Persist per-request: tenant, timestamp, query, retrieved chunk IDs, model versions used, answer text, citation set. Append-only, retention configurable per tenant. Not legally required at limited risk; required the moment a deployer asks "what did the system tell my engineer last Tuesday?" or a use case drifts toward high-risk.
Adjacent, not the same thing. The showcase already persists
visit-level analytics (TenantVisit table, written by the
/api/track beacon — tenant, timestamp, optional
?utm_recipient= name, opaque visitor cookie, UA/locale/screen
fingerprint). It is the input to the password-gated admin
dashboard used for sales analytics; the rag-debug=true cookie
is a cookie-pinned opt-out that short-circuits the insert
server-side. Visit logging is not AI-request logging — it
records that a page loaded, not the question asked or the
answer returned. A productionising deployer would either layer
request-level logging on top (recommended for high-risk
postures) or strip the visit beacon entirely (recommended for
deployments where even page-load analytics are out of scope).
Model inventory page (~2 hours). Auto-render the two
MODELS maps as a public /models page. Include provider name
and GPAI compliance link per entry.
System / model card (~½ day). Intended use, limitations
(see CLAUDE.md § "Known limitations"), languages supported
(currently German end-to-end), corpus scope, evaluation results
from eval/, escalation contact. Public.
Tenant-side AI literacy briefing (~2 hours per tenant). One-pager covering what RAG is, when to defer to the source PDF, and the system's intended use boundary.
Incident reporting hook (~½ day). A "report a wrong answer" button on each assistant message, writing to the interaction log with a flag. Becomes the input to the deployer's incident process.
Export-time output marking (future). If/when the assistant gains export-to-PDF or share-link functionality, attach a machine-readable AI-generated marker at that boundary.
None of these require architectural changes. The system is deliberately designed so the compliance surface is additive.
When any of the following change, update the relevant section:
ai-models.ts, models.py) → Section 2, Art. 53–55 row.eval/questions.*.json) → Section 2, Art. 15 row./api/track, TenantVisit schema,
rag-debug opt-out) → Section 4, gap #1.