Skip to content

Annex IV dossier bundle

The dossier is the single artefact a Market Surveillance Authority can request to verify your compliance with Arts. 11, 12, 15, 53, and 73 for a given reporting period.

Endpoint

GET /api/v1/compliance/dossier?period_start=<ISO>&period_end=<ISO>

Returns application/zip with Content-Disposition: attachment; filename="dossier_<org_slug>_<YYYYMMDD>_<YYYYMMDD>.zip".

Implemented in backend/app/services/dossier_service.py.

Zip layout

dossier_<org>_<period>/
├── annex_iv.pdf                    # Art. 11 technical documentation
├── audit_log_period.jsonl          # Art. 12 event logs (HMAC chain)
├── integrity_attestation.json      # Chain verify result at bundle time
├── metrics.json                    # Art. 15 aggregates
├── provider_manifest.json          # Art. 53 upstream GPAI trail
├── incidents.json                  # Art. 73 incidents in period
└── MANIFEST.json                   # File inventory with sha256 per member

annex_iv.pdf

Generated by report_generator.generate(). Sections:

  1. Cover page
  2. System description
  3. Data governance
  4. Interaction summary (period aggregates)
  5. Human oversight (actions breakdown from audit_log_oversight)
  6. Bias monitoring
  7. Integrity verification (chain status)
  8. Risk assessment (boilerplate in OSS; commercial edition customises per deployer)

If WeasyPrint system deps are missing on the host, falls back to annex_iv.html. The MANIFEST.json's files[] key uses the actual filename so verification still works either way.

audit_log_period.jsonl

One JSON line per audit entry in the period, ordered by sequence_number ASC. Each line contains every field that went into the HMAC payload, plus the stored previous_hash and current_hash. A regulator can verify the chain offline given the per-org HMAC key.

integrity_attestation.json

The result of audit_service.verify_chain() at the moment the bundle was built. Fields: verified, total_entries, entries_checked, first_broken_at, message, plus bundle metadata.

metrics.json

Art. 15 aggregates over the period:

{
  "format": "lex-custis/metrics/v1",
  "period_start": "...",
  "period_end": "...",
  "total_interactions": 847,
  "confidence": { "avg": 0.72, "min": 0.31, "max": 0.98 },
  "pii":  { "detections": 12, "rate": 0.0142 },
  "bias": { "flagged_responses": 3, "rate": 0.0035 },
  "human_oversight": { "actions_recorded": 102, "rate": 0.1204 },
  "regulation": "EU AI Act Art. 15 — accuracy, robustness, cybersecurity (aggregates only; drift detection in commercial edition)"
}

provider_manifest.json

Distinct (provider, model) pairs seen in the period:

{
  "format": "lex-custis/provider-manifest/v1",
  "period_start": "...",
  "period_end": "...",
  "providers": [
    {
      "provider": "mistral",
      "model": "mistral-large-latest",
      "call_count": 842,
      "first_seen": "...",
      "last_seen": "...",
      "upstream_disclosure": {
        "policy_url": "https://mistral.ai/legal/",
        "note": "Mistral models — see provider's Art. 53 GPAI disclosure..."
      }
    },
    {
      "provider": "ollama",
      "model": "gemma3:4b",
      "call_count": 5,
      "first_seen": "...",
      "last_seen": "...",
      "upstream_disclosure": {
        "policy_url": "https://ollama.com/",
        "note": "Self-hosted Ollama runtime..."
      }
    }
  ],
  "regulation": "EU AI Act Art. 53 — GPAI provider disclosures"
}

incidents.json

The list of Art. 73 incidents with detection_ts within the period, each rendered via regulator_export(incident) + a sla_status_at_bundle snapshot.

MANIFEST.json

The bundle's root of trust:

{
  "format": "lex-custis/annex-iv-dossier/v1",
  "organization_id": "...",
  "organization_name": "...",
  "period_start": "...",
  "period_end": "...",
  "generated_at": "...",
  "generated_by_user_id": "...",
  "files": [
    { "name": "annex_iv.pdf",
      "size_bytes": 283415,
      "sha256": "abc123..." },
    { "name": "audit_log_period.jsonl",
      "size_bytes": 1128345,
      "sha256": "def456..." },
    ...
  ],
  "regulation_coverage": {
    "art_11": "annex_iv.pdf",
    "art_12": "audit_log_period.jsonl",
    "art_15": "metrics.json",
    "art_53": "provider_manifest.json",
    "art_73": "incidents.json"
  },
  "integrity_verified_at_bundle_time": true
}

In commercial editions, the MANIFEST.json is signed with ed25519 and time-stamped via RFC-3161. The public verify-key is published at a well-known URL and on the repo, so a regulator can verify the bundle without contact.

What's not in the bundle

  • Raw training-data statistics (Art. 10 full dataset registry is paid).
  • Art. 9 risk register (paid).
  • Art. 27 FRIA outputs (paid).
  • Deployer-specific IFUs (commercial customises the PDF per deployer).

If a regulator asks specifically for those, you'll need to ship them separately in OSS v0.1. Commercial edition adds them into the same bundle.

Offline verification

Even without a CLI, the verification procedure is reproducible by hand:

  1. Extract the zip.
  2. Confirm every file listed in MANIFEST.json.files[] exists and its sha256 matches.
  3. Read audit_log_period.jsonl in sequence_number order.
  4. For each entry, recompute:
    payload = previous_hash | org_id | user_id | sequence_number
            | llm_provider | llm_model | timestamp_iso
            | sha256(user_prompt) | sha256(ai_response)
    expected = HMAC-SHA-256(k_org, payload)
    
    where k_org = HKDF-SHA-256(K, salt=org_id.bytes, info=b"lex-custis/audit-log/per-org-hmac-key/v1", 32 bytes).
  5. Confirm expected == entry.current_hash for all entries.

If this walk succeeds, the chain is intact. If step 5 fails, the first failing sequence_number localises the tamper.

A v0.2 CLI (verify-dossier) will automate this.