Architecture¶

                           +--------------------+
                           |      Frontend      |
                           |   Next.js 15 SPA   |
                           +----------+---------+
                                      | HTTPS / SSE
                                      v
                           +----------+---------+
                           |    FastAPI API     |
                           |  JWT · slowapi     |
                           +----+-----------+---+
                                |           |
         +----------------------+           +-----------------+
         v                                                    v
+--------+---------+                                 +--------+-------+
| Compliance core  |                                 |   AI layer     |
|                  |                                 |                |
| · HMAC chain     |                                 | · Mistral      |
| · PII / bias     |                                 | · Ollama       |
| · Dossier zip    |                                 | · Qdrant       |
| · Incidents      |                                 | · BGE-M3       |
+--------+---------+                                 +--------+-------+
         |                                                    |
         +--------+-------------------------------------------+
                  v                                     v
         +--------+--------+                 +----------+-------+
         |  PostgreSQL 16  |                 |      Redis       |
         |  append-only    |                 |  rate limits     |
         |  (role REVOKE)  |                 |  budget counter  |
         +-----------------+                 +------------------+

The stack is deliberately boring — everything a mid-market SaaS already runs.

Request lifecycle (chat endpoint)¶

HTTP request hits Caddy / Nginx (in prod) → FastAPI.
get_tenant dependency resolves the JWT into a TenantContext (user_id, organization_id, user_role).
enforce_org_chat_budget checks / increments the Redis per-org daily counter; blocks at ORG_DAILY_CHAT_CAP.
get_llm_provider(org) returns a provider instance (Mistral, Ollama, or any other BaseLLMProvider implementation).
compliance_engine.pre_check runs PII detection + prompt-injection flagging (never blocks, only logs).
rag_service.search_all retrieves from the tenant's Qdrant collection + shared KB.
provider.stream_chat(messages) streams tokens; an async generator yields them to the client via SSE.
After the stream completes, compliance_engine.post_check computes confidence, source grounding, bias flags.
audit_service.create_audit_entry writes the HMAC-chained row in a fresh transaction (the SSE generator outlives the HTTP request scope, so we open a new session).
A low-confidence signal is emitted to the client just before [DONE].

Per-organisation isolation¶

Every row in every tenant-owned table has an organization_id. Every query uses tenant_select(Model, tenant) which pre-filters on it. The audit_log_invisible_cross_org integration test proves this from the outside.

Postgres: row-level filtering by organization_id.
Qdrant: one collection per org (org_{uuid}_documents), with org_id also embedded in payload metadata as a second filter.
Redis: rate-limit keys are prefixed by user:{uuid} or ip:{addr}; budget counter keys are budget:chat:{org}:{YYYYMMDD}.
Hash chain: per-org HKDF subkey derived from the master HMAC key held outside Postgres. Compromise of one tenant's key cannot forge another tenant's chain.

Append-only audit table¶

audit_logs has REVOKE UPDATE, DELETE, TRUNCATE FROM lexcustis (the app DB role). The app can INSERT only. Human oversight actions live in audit_log_oversight, a separate table with UPDATE privilege, FK'd 1-to-1 to audit_logs via the audit_log_id UNIQUE NOT NULL.

This is enforced in the migration 001_oss_v01_initial.py. If you ever need to fix a bug via a migration that rewrites historical rows, run it as the Postgres superuser (not the app role) and document the event in the compliance report manually.

Multi-LLM providers¶

The ABC in backend/app/services/llm/base.py:

class BaseLLMProvider(abc.ABC):
    provider_name: str
    provider_model: str
    data_region: str

    @abc.abstractmethod
    async def stream_chat(
        self, messages, temperature=0.3
    ) -> AsyncIterator[str]:
        ...

Shipped concrete impls in OSS:

MistralProvider (EU, hosted, OpenAI-compatible)
OllamaProvider (self-hosted, OpenAI-compatible)

Both share an OpenAICompatibleProvider base because their SSE format is identical. get_llm_provider(org) is the factory — per-org settings first, then env defaults.

Anthropic / OpenAI / Azure OpenAI live in a closed commercial plugin repo — same ABC, separate deployable.

Embeddings¶

RAG uses BGE-M3 (BAAI, 1024-dim) via sentence-transformers, loaded lazily by rag_service. Embeddings run locally in the backend container — no data leaves the host for the vector path, regardless of which LLM provider you pick.

What's not in this diagram¶

Celery worker is provisioned but currently unused in v0.1 (placeholder for commercial PMM batch jobs).
Backup / DR is your job for OSS. Commercial edition includes managed Postgres snapshots + Qdrant exports to object storage.
Monitoring (Prometheus, Grafana, Sentry) is wired up in paid tier, not in the OSS stack.

See architecture/hash-chain.md, architecture/multi-tenancy.md, architecture/providers.md, architecture/dossier.md for deeper dives.