Tamper-evident hash chain¶
This is the core cryptographic claim the product makes. If this design is sound, an attacker with Postgres access cannot rewrite the audit log without detection. If it's broken, every other promise collapses.
This page walks the design in enough detail that a regulator's technical team or an external auditor can reproduce and review it.
The claim (formal)¶
Given:
- A master secret K held outside Postgres (environment variable in OSS v0.1; KMS-wrapped in commercial edition).
- A per-organisation subkey k_{org} = HKDF(K, salt=org_id, info=…, 32).
- A canonical per-entry payload p_i =
previous_hash | org_id | user_id | sequence_number | provider | model | timestamp | sha256(prompt) | sha256(response). - Each entry's recorded hash h_i = HMAC-SHA-256(k_{org}, p_i).
An adversary with full read/write on audit_logs but no access to
K cannot:
- Modify any entry's
user_prompt,ai_response,llm_provider,llm_model,request_at,user_id,organization_id, orsequence_numberwithout changing h_i. - Insert a forged entry with a chosen
current_hashthat validates against the chain. - Delete an entry and re-link the chain around it.
The chain-walker in audit_service.verify_chain() recomputes every
h_i using the live K and flags the first broken entry with its
sequence_number.
Why HMAC, not plain SHA-256¶
The previous design was sha256(prev | prompt | response | model |
timestamp). An attacker with DB access can:
- Edit the target entry's content.
- Recompute
current_hash. - Walk the chain forward, recomputing every subsequent
current_hashandprevious_hash. verify_chain()(which also only has the public hash function) sees a valid chain.
Plain hashes authenticate continuity, not authorship. Only a keyed MAC bound to a secret the attacker doesn't control protects against the rewrite attack.
Payload canonicalisation¶
payload = previous_hash # hex string, 64 chars
+ "|" + str(org_id) # UUID, stringified
+ "|" + str(user_id) # UUID, stringified
+ "|" + str(sequence_no) # decimal integer
+ "|" + llm_provider # e.g. "mistral"
+ "|" + llm_model # e.g. "mistral-large-latest"
+ "|" + timestamp_iso # request_at in ISO 8601
+ "|" + sha256_hex(user_prompt.encode("utf-8"))
+ "|" + sha256_hex(ai_response.encode("utf-8"))
The prompt and response are hashed before concatenation so a pipe character in user input cannot masquerade as a field separator.
timestamp_iso must be the exact string produced by
datetime.isoformat() in Python at insertion time. Recomputation on
read uses entry.request_at.isoformat(). If your DB driver stores the
timestamp with microsecond precision but returns only millisecond,
verification will break. Lex Custis uses timestamptz with default
precision on both sides — consistent.
Per-org key derivation (HKDF)¶
k_org = HKDF-SHA-256(
IKM = AUDIT_HMAC_MASTER_KEY (utf-8 bytes),
salt = org_id.bytes (16 bytes),
info = b"lex-custis/audit-log/per-org-hmac-key/v1",
L = 32 bytes,
)
Using org_id.bytes as the salt means two different organisations
cannot collide on a key, and the same organisation consistently
derives the same key across backend restarts.
GENESIS_HASH¶
The first entry per org uses previous_hash = sha256(b"LEX_CUSTIS_GENESIS").
Every org's chain therefore starts from the same constant, but diverges
immediately because k_org is org-specific.
Verification procedure¶
audit_service.verify_chain(org_id, db):
count total entries for org_id
if zero: return verified=True, message="No audit entries"
stream rows WHERE organization_id = :org_id ORDER BY sequence_number ASC
expected_previous = GENESIS_HASH
k = derive_org_key(org_id)
for each row:
if row.previous_hash != expected_previous:
return verified=False, first_broken_at=row.sequence_number,
message="Chain broken..."
recomputed = HMAC-SHA-256(k, canonical_payload(row))
if row.current_hash != recomputed:
return verified=False, first_broken_at=row.sequence_number,
message="HMAC mismatch..."
expected_previous = row.current_hash
return verified=True
Chain integrity is verified in a single linear pass. For very long
chains (millions of rows) we stream rather than load into memory — but
in OSS v0.1 result.scalars().all() buffers the whole chain. Paid
tier replaces this with a persisted "last-verified tip" checkpoint so
dashboard integrity checks only verify deltas, with full walks on
demand.
Threat model¶
What the chain protects against¶
- Malicious DBA with live DB credentials. Cannot rewrite or insert entries without knowing K. Verification detects the tamper.
- SQL-injection vulnerability with arbitrary write. Same. A
UPDATE audit_logs SET user_prompt = 'tampered'shows up as an HMAC mismatch on the next chain walk. - Backup-and-restore attack. Rolling back to a prior backup drops
later entries, but
sequence_numberis a unique Postgres sequence — restoring a backup does not regenerate later numbers, so the chain still walks correctly up to the last restored entry. An attacker cannot "hide" an entry by backup-rollback without also resetting the sequence and keeping the trace intact. Any subsequent insert picks up from the pre-rollback sequence value (if you restore the sequence along with the table) or jumps to a fresh one (if you don't), either way visible in downstream analysis.
What it does NOT protect against¶
- Compromise of K. Whoever holds the master key can forge entries at will. This is why K lives outside Postgres and rotates on suspicion. In commercial edition K is held in KMS and never leaves the HSM envelope.
- Total deletion of all rows without replacement. An attacker with superuser who drops the table wipes the evidence. Protect with WAL archival + off-host backups. Commercial edition ships this.
- Replay of past valid entries. Not relevant — the chain is monotonic
and
sequence_numberis unique, so an old entry cannot be replayed as new. But if a bug in application code somehow creates duplicate(org_id, sequence_number)pairs, the unique constraint rejects it at the DB layer. - Side-channel leakage of prompt/response content. The HMAC
covers a SHA-256 of each, so the HMAC itself doesn't leak, but the
plaintext is still stored in
user_prompt/ai_responsein v0.1. Column-level envelope encryption with a KMS-wrapped DEK is a v0.2 feature.
Rotating the master key¶
Do not rotate casually. Rotation invalidates every existing chain unless you keep the old key around for verification of historical entries.
Operationally:
- Keep the current key as
AUDIT_HMAC_MASTER_KEY_PREV. - Set the new key as
AUDIT_HMAC_MASTER_KEY. - New entries chain from the new key (the prev hash of the first new
entry is the last-verified old entry's
current_hash). verify_chaintries the new key first, falls back toAUDIT_HMAC_MASTER_KEY_PREVfor older entries (v0.2 feature, not in OSS v0.1).
Alternative: append-chain approach — write a "key rotation" entry whose payload includes the new key fingerprint, chained under the old key. Clean but requires verifier changes. Commercial KMS integration handles this automatically with KMS key versions.
Independent verifier¶
We'll publish a verify-dossier CLI in v0.2 that:
- Reads a
MANIFEST.json+audit_log_period.jsonlfrom a dossier zip. - Walks the chain.
- Given an
AUDIT_HMAC_MASTER_KEY(or a KMS reference), recomputes every HMAC. - Outputs a verification report.
A regulator or Big-4 auditor can run this offline without touching your infrastructure.
Code pointer¶
Implementation: backend/app/services/audit_service.py.
Tests: backend/tests/test_audit_service.py (parametrised
"every field is bound" tests catch regressions where a field is
removed from the payload).