Skip to content

Architecture

                           +--------------------+
                           |      Frontend      |
                           |   Next.js 15 SPA   |
                           +----------+---------+
                                      | HTTPS / SSE
                                      v
                           +----------+---------+
                           |    FastAPI API     |
                           |  JWT · slowapi     |
                           +----+-----------+---+
                                |           |
         +----------------------+           +-----------------+
         v                                                    v
+--------+---------+                                 +--------+-------+
| Compliance core  |                                 |   AI layer     |
|                  |                                 |                |
| · HMAC chain     |                                 | · Mistral      |
| · PII / bias     |                                 | · Ollama       |
| · Dossier zip    |                                 | · Qdrant       |
| · Incidents      |                                 | · BGE-M3       |
+--------+---------+                                 +--------+-------+
         |                                                    |
         +--------+-------------------------------------------+
                  v                                     v
         +--------+--------+                 +----------+-------+
         |  PostgreSQL 16  |                 |      Redis       |
         |  append-only    |                 |  rate limits     |
         |  (role REVOKE)  |                 |  budget counter  |
         +-----------------+                 +------------------+

The stack is deliberately boring — everything a mid-market SaaS already runs.

Request lifecycle (chat endpoint)

  1. HTTP request hits Caddy / Nginx (in prod) → FastAPI.
  2. get_tenant dependency resolves the JWT into a TenantContext (user_id, organization_id, user_role).
  3. enforce_org_chat_budget checks / increments the Redis per-org daily counter; blocks at ORG_DAILY_CHAT_CAP.
  4. get_llm_provider(org) returns a provider instance (Mistral, Ollama, or any other BaseLLMProvider implementation).
  5. compliance_engine.pre_check runs PII detection + prompt-injection flagging (never blocks, only logs).
  6. rag_service.search_all retrieves from the tenant's Qdrant collection + shared KB.
  7. provider.stream_chat(messages) streams tokens; an async generator yields them to the client via SSE.
  8. After the stream completes, compliance_engine.post_check computes confidence, source grounding, bias flags.
  9. audit_service.create_audit_entry writes the HMAC-chained row in a fresh transaction (the SSE generator outlives the HTTP request scope, so we open a new session).
  10. A low-confidence signal is emitted to the client just before [DONE].

Per-organisation isolation

Every row in every tenant-owned table has an organization_id. Every query uses tenant_select(Model, tenant) which pre-filters on it. The audit_log_invisible_cross_org integration test proves this from the outside.

  • Postgres: row-level filtering by organization_id.
  • Qdrant: one collection per org (org_{uuid}_documents), with org_id also embedded in payload metadata as a second filter.
  • Redis: rate-limit keys are prefixed by user:{uuid} or ip:{addr}; budget counter keys are budget:chat:{org}:{YYYYMMDD}.
  • Hash chain: per-org HKDF subkey derived from the master HMAC key held outside Postgres. Compromise of one tenant's key cannot forge another tenant's chain.

Append-only audit table

audit_logs has REVOKE UPDATE, DELETE, TRUNCATE FROM lexcustis (the app DB role). The app can INSERT only. Human oversight actions live in audit_log_oversight, a separate table with UPDATE privilege, FK'd 1-to-1 to audit_logs via the audit_log_id UNIQUE NOT NULL.

This is enforced in the migration 001_oss_v01_initial.py. If you ever need to fix a bug via a migration that rewrites historical rows, run it as the Postgres superuser (not the app role) and document the event in the compliance report manually.

Multi-LLM providers

The ABC in backend/app/services/llm/base.py:

class BaseLLMProvider(abc.ABC):
    provider_name: str
    provider_model: str
    data_region: str

    @abc.abstractmethod
    async def stream_chat(
        self, messages, temperature=0.3
    ) -> AsyncIterator[str]:
        ...

Shipped concrete impls in OSS:

  • MistralProvider (EU, hosted, OpenAI-compatible)
  • OllamaProvider (self-hosted, OpenAI-compatible)

Both share an OpenAICompatibleProvider base because their SSE format is identical. get_llm_provider(org) is the factory — per-org settings first, then env defaults.

Anthropic / OpenAI / Azure OpenAI live in a closed commercial plugin repo — same ABC, separate deployable.

Embeddings

RAG uses BGE-M3 (BAAI, 1024-dim) via sentence-transformers, loaded lazily by rag_service. Embeddings run locally in the backend container — no data leaves the host for the vector path, regardless of which LLM provider you pick.

What's not in this diagram

  • Celery worker is provisioned but currently unused in v0.1 (placeholder for commercial PMM batch jobs).
  • Backup / DR is your job for OSS. Commercial edition includes managed Postgres snapshots + Qdrant exports to object storage.
  • Monitoring (Prometheus, Grafana, Sentry) is wired up in paid tier, not in the OSS stack.

See architecture/hash-chain.md, architecture/multi-tenancy.md, architecture/providers.md, architecture/dossier.md for deeper dives.