Architecture¶
+--------------------+
| Frontend |
| Next.js 15 SPA |
+----------+---------+
| HTTPS / SSE
v
+----------+---------+
| FastAPI API |
| JWT · slowapi |
+----+-----------+---+
| |
+----------------------+ +-----------------+
v v
+--------+---------+ +--------+-------+
| Compliance core | | AI layer |
| | | |
| · HMAC chain | | · Mistral |
| · PII / bias | | · Ollama |
| · Dossier zip | | · Qdrant |
| · Incidents | | · BGE-M3 |
+--------+---------+ +--------+-------+
| |
+--------+-------------------------------------------+
v v
+--------+--------+ +----------+-------+
| PostgreSQL 16 | | Redis |
| append-only | | rate limits |
| (role REVOKE) | | budget counter |
+-----------------+ +------------------+
The stack is deliberately boring — everything a mid-market SaaS already runs.
Request lifecycle (chat endpoint)¶
- HTTP request hits Caddy / Nginx (in prod) → FastAPI.
get_tenantdependency resolves the JWT into aTenantContext(user_id,organization_id,user_role).enforce_org_chat_budgetchecks / increments the Redis per-org daily counter; blocks atORG_DAILY_CHAT_CAP.get_llm_provider(org)returns a provider instance (Mistral, Ollama, or any otherBaseLLMProviderimplementation).compliance_engine.pre_checkruns PII detection + prompt-injection flagging (never blocks, only logs).rag_service.search_allretrieves from the tenant's Qdrant collection + shared KB.provider.stream_chat(messages)streams tokens; an async generator yields them to the client via SSE.- After the stream completes,
compliance_engine.post_checkcomputes confidence, source grounding, bias flags. audit_service.create_audit_entrywrites the HMAC-chained row in a fresh transaction (the SSE generator outlives the HTTP request scope, so we open a new session).- A low-confidence signal is emitted to the client just before
[DONE].
Per-organisation isolation¶
Every row in every tenant-owned table has an organization_id. Every
query uses tenant_select(Model, tenant) which pre-filters on it.
The audit_log_invisible_cross_org integration test proves this
from the outside.
- Postgres: row-level filtering by
organization_id. - Qdrant: one collection per org (
org_{uuid}_documents), withorg_idalso embedded in payload metadata as a second filter. - Redis: rate-limit keys are prefixed by
user:{uuid}orip:{addr}; budget counter keys arebudget:chat:{org}:{YYYYMMDD}. - Hash chain: per-org HKDF subkey derived from the master HMAC key held outside Postgres. Compromise of one tenant's key cannot forge another tenant's chain.
Append-only audit table¶
audit_logs has REVOKE UPDATE, DELETE, TRUNCATE FROM lexcustis
(the app DB role). The app can INSERT only. Human oversight actions
live in audit_log_oversight, a separate table with UPDATE
privilege, FK'd 1-to-1 to audit_logs via the
audit_log_id UNIQUE NOT NULL.
This is enforced in the migration 001_oss_v01_initial.py. If you
ever need to fix a bug via a migration that rewrites historical rows,
run it as the Postgres superuser (not the app role) and document the
event in the compliance report manually.
Multi-LLM providers¶
The ABC in backend/app/services/llm/base.py:
class BaseLLMProvider(abc.ABC):
provider_name: str
provider_model: str
data_region: str
@abc.abstractmethod
async def stream_chat(
self, messages, temperature=0.3
) -> AsyncIterator[str]:
...
Shipped concrete impls in OSS:
MistralProvider(EU, hosted, OpenAI-compatible)OllamaProvider(self-hosted, OpenAI-compatible)
Both share an OpenAICompatibleProvider base because their SSE format
is identical. get_llm_provider(org) is the factory — per-org
settings first, then env defaults.
Anthropic / OpenAI / Azure OpenAI live in a closed commercial plugin repo — same ABC, separate deployable.
Embeddings¶
RAG uses BGE-M3 (BAAI, 1024-dim) via sentence-transformers,
loaded lazily by rag_service. Embeddings run locally in the backend
container — no data leaves the host for the vector path, regardless of
which LLM provider you pick.
What's not in this diagram¶
- Celery worker is provisioned but currently unused in v0.1 (placeholder for commercial PMM batch jobs).
- Backup / DR is your job for OSS. Commercial edition includes managed Postgres snapshots + Qdrant exports to object storage.
- Monitoring (Prometheus, Grafana, Sentry) is wired up in paid tier, not in the OSS stack.
See architecture/hash-chain.md, architecture/multi-tenancy.md, architecture/providers.md, architecture/dossier.md for deeper dives.