Tamper-evident hash chain¶

This is the core cryptographic claim the product makes. If this design is sound, an attacker with Postgres access cannot rewrite the audit log without detection. If it's broken, every other promise collapses.

This page walks the design in enough detail that a regulator's technical team or an external auditor can reproduce and review it.

The claim (formal)¶

Given:

A master secret K held outside Postgres (environment variable in OSS v0.1; KMS-wrapped in commercial edition).
A per-organisation subkey k_{org} = HKDF(K, salt=org_id, info=…, 32).
A canonical per-entry payload p_i = previous_hash | org_id | user_id | sequence_number | provider | model | timestamp | sha256(prompt) | sha256(response).
Each entry's recorded hash h_i = HMAC-SHA-256(k_{org}, p_i).

An adversary with full read/write on audit_logs but no access to K cannot:

Modify any entry's user_prompt, ai_response, llm_provider, llm_model, request_at, user_id, organization_id, or sequence_number without changing h_i.
Insert a forged entry with a chosen current_hash that validates against the chain.
Delete an entry and re-link the chain around it.

The chain-walker in audit_service.verify_chain() recomputes every h_i using the live K and flags the first broken entry with its sequence_number.

Why HMAC, not plain SHA-256¶

The previous design was sha256(prev | prompt | response | model | timestamp). An attacker with DB access can:

Edit the target entry's content.
Recompute current_hash.
Walk the chain forward, recomputing every subsequent current_hash and previous_hash.
verify_chain() (which also only has the public hash function) sees a valid chain.

Plain hashes authenticate continuity, not authorship. Only a keyed MAC bound to a secret the attacker doesn't control protects against the rewrite attack.

Payload canonicalisation¶

payload = previous_hash            # hex string, 64 chars
        + "|" + str(org_id)        # UUID, stringified
        + "|" + str(user_id)       # UUID, stringified
        + "|" + str(sequence_no)   # decimal integer
        + "|" + llm_provider       # e.g. "mistral"
        + "|" + llm_model          # e.g. "mistral-large-latest"
        + "|" + timestamp_iso      # request_at in ISO 8601
        + "|" + sha256_hex(user_prompt.encode("utf-8"))
        + "|" + sha256_hex(ai_response.encode("utf-8"))

The prompt and response are hashed before concatenation so a pipe character in user input cannot masquerade as a field separator.

timestamp_iso must be the exact string produced by datetime.isoformat() in Python at insertion time. Recomputation on read uses entry.request_at.isoformat(). If your DB driver stores the timestamp with microsecond precision but returns only millisecond, verification will break. Lex Custis uses timestamptz with default precision on both sides — consistent.

Per-org key derivation (HKDF)¶

k_org = HKDF-SHA-256(
    IKM   = AUDIT_HMAC_MASTER_KEY (utf-8 bytes),
    salt  = org_id.bytes (16 bytes),
    info  = b"lex-custis/audit-log/per-org-hmac-key/v1",
    L     = 32 bytes,
)

Using org_id.bytes as the salt means two different organisations cannot collide on a key, and the same organisation consistently derives the same key across backend restarts.

GENESIS_HASH¶

The first entry per org uses previous_hash = sha256(b"LEX_CUSTIS_GENESIS"). Every org's chain therefore starts from the same constant, but diverges immediately because k_org is org-specific.

Verification procedure¶

audit_service.verify_chain(org_id, db):

count total entries for org_id
if zero: return verified=True, message="No audit entries"

stream rows WHERE organization_id = :org_id ORDER BY sequence_number ASC
expected_previous = GENESIS_HASH
k = derive_org_key(org_id)

for each row:
    if row.previous_hash != expected_previous:
        return verified=False, first_broken_at=row.sequence_number,
               message="Chain broken..."

    recomputed = HMAC-SHA-256(k, canonical_payload(row))
    if row.current_hash != recomputed:
        return verified=False, first_broken_at=row.sequence_number,
               message="HMAC mismatch..."

    expected_previous = row.current_hash

return verified=True

Chain integrity is verified in a single linear pass. For very long chains (millions of rows) we stream rather than load into memory — but in OSS v0.1 result.scalars().all() buffers the whole chain. Paid tier replaces this with a persisted "last-verified tip" checkpoint so dashboard integrity checks only verify deltas, with full walks on demand.

Threat model¶

What the chain protects against¶

Malicious DBA with live DB credentials. Cannot rewrite or insert entries without knowing K. Verification detects the tamper.
SQL-injection vulnerability with arbitrary write. Same. A UPDATE audit_logs SET user_prompt = 'tampered' shows up as an HMAC mismatch on the next chain walk.
Backup-and-restore attack. Rolling back to a prior backup drops later entries, but sequence_number is a unique Postgres sequence — restoring a backup does not regenerate later numbers, so the chain still walks correctly up to the last restored entry. An attacker cannot "hide" an entry by backup-rollback without also resetting the sequence and keeping the trace intact. Any subsequent insert picks up from the pre-rollback sequence value (if you restore the sequence along with the table) or jumps to a fresh one (if you don't), either way visible in downstream analysis.

What it does NOT protect against¶

Compromise of K. Whoever holds the master key can forge entries at will. This is why K lives outside Postgres and rotates on suspicion. In commercial edition K is held in KMS and never leaves the HSM envelope.
Total deletion of all rows without replacement. An attacker with superuser who drops the table wipes the evidence. Protect with WAL archival + off-host backups. Commercial edition ships this.
Replay of past valid entries. Not relevant — the chain is monotonic and sequence_number is unique, so an old entry cannot be replayed as new. But if a bug in application code somehow creates duplicate (org_id, sequence_number) pairs, the unique constraint rejects it at the DB layer.
Side-channel leakage of prompt/response content. The HMAC covers a SHA-256 of each, so the HMAC itself doesn't leak, but the plaintext is still stored in user_prompt / ai_response in v0.1. Column-level envelope encryption with a KMS-wrapped DEK is a v0.2 feature.

Rotating the master key¶

Do not rotate casually. Rotation invalidates every existing chain unless you keep the old key around for verification of historical entries.

Operationally:

Keep the current key as AUDIT_HMAC_MASTER_KEY_PREV.
Set the new key as AUDIT_HMAC_MASTER_KEY.
New entries chain from the new key (the prev hash of the first new entry is the last-verified old entry's current_hash).
verify_chain tries the new key first, falls back to AUDIT_HMAC_MASTER_KEY_PREV for older entries (v0.2 feature, not in OSS v0.1).

Alternative: append-chain approach — write a "key rotation" entry whose payload includes the new key fingerprint, chained under the old key. Clean but requires verifier changes. Commercial KMS integration handles this automatically with KMS key versions.

Independent verifier¶

We'll publish a verify-dossier CLI in v0.2 that:

Reads a MANIFEST.json + audit_log_period.jsonl from a dossier zip.
Walks the chain.
Given an AUDIT_HMAC_MASTER_KEY (or a KMS reference), recomputes every HMAC.
Outputs a verification report.

A regulator or Big-4 auditor can run this offline without touching your infrastructure.

Code pointer¶

Implementation: backend/app/services/audit_service.py.

Tests: backend/tests/test_audit_service.py (parametrised "every field is bound" tests catch regressions where a field is removed from the payload).