Runbooks / Audit Chain

Audit Chain (Tamper-Evident Audit)

Operator runbook for IdentityMesh’s append-only hash chain over IM_AdminAudit and IM_ObjectAudit. Lets you answer the auditor question that came up in the SOC 2 readiness review: “could a rogue DBA hide their tracks?”

What it is

Every new audit row carries two extra columns:

ColumnTypeMeaning
RowHashCHAR(64)Hex-encoded SHA-256 of `PreviousRowHash
PreviousRowHashCHAR(64)The previous row’s RowHash, or NULL for the very first row

When IdentityMesh writes an audit row, it:

  1. Reads the most-recent existing RowHash on the same table.
  2. Sets the new row’s PreviousRowHash to that value.
  3. Computes the new RowHash = SHA-256(prev || canonical-form-of-this-row).
  4. Inserts the new row inside the same transaction (serializable isolation on SQL Server) so the read-then-write is atomic.

The result is a chain: each row’s hash depends on every previous row’s content, so a single edit anywhere in the table breaks every row that follows it.

What it protects against

What it does NOT protect against

Canonical form (the contract)

The hash inputs need to be stable across processes and across implementations (you may want an external auditor to verify the chain without IdentityMesh code). The format is line-oriented, one column per line, terminated with \n:

For IM_AdminAudit:

WhenUtc=<ISO 8601 UTC, 7 fractional digits, e.g. 2026-04-25T13:14:15.1234567Z>
ActorUpn=<value or empty>
ActorSid=<value or empty>
Action=<value>
TargetKind=<value or empty>
TargetId=<value or empty>
StatusCode=<integer>
CorrelationId=<value or empty>
BeforeJson=<value or empty>
AfterJson=<value or empty>

For IM_ObjectAudit:

MeshObjectId=<lowercase hyphenated GUID>
CSObjectId=<lowercase hyphenated GUID>
ChangeType=<value>
Source=<value>
ChangeJson=<value>
ChangedOn=<ISO 8601 UTC, 7 fractional digits>
ActorUpn=<value or empty>
ActorSid=<value or empty>
ActorSource=<value>

Then RowHash = lowercase-hex( SHA-256( utf8( PreviousRowHash || canonical ) ) ), where PreviousRowHash is the empty string (not the literal word “null”) for the first row in the table.

How to verify

Two surfaces, both backed by the same AuditChainVerifier helper.

From the Admin API

GET /api/admin/audit/verify-chain?table=admin
GET /api/admin/audit/verify-chain?table=object

Requires the audit.read permission. Sample response:

{
  "rowsChecked": 14283,
  "legacyRowsSkipped": 412,
  "brokenLinks": 0,
  "firstBrokenAuditId": null
}

A healthy chain has brokenLinks: 0 and firstBrokenAuditId: null.

From the Sync Engine CLI

On the host running IdentityMesh.Service.exe:

IdentityMesh.Service.exe --verify-audit-chain

Sample output:

Verifying audit chains (ER-005)...
AdminAudit: RowsChecked=14283 LegacyRowsSkipped=412 BrokenLinks=0 FirstBrokenAuditId=(none)
ObjectAudit: RowsChecked=88112 LegacyRowsSkipped=2104 BrokenLinks=0 FirstBrokenAuditId=(none)
RESULT: chains intact

Exit code is 0 for intact chains, 2 if any breakage is detected. Suitable for piping into a scheduled task or a monitoring agent that alerts on non-zero exit.

Operator guidance — when to run

Run --verify-audit-chain (or hit the API endpoint):

Legacy rows

Tables existed before this feature shipped. Rows written before the migration carry RowHash IS NULL and PreviousRowHash IS NULL. The verifier reports these as LegacyRowsSkipped — they’re outside the chain, intentionally, because there’s no defensible way to retroactively hash data that was never committed to a chain at write time.

The legacy count drops monotonically as those rows age out of the retention window (see audit-retention.md); within one full retention period after the upgrade, every audit row will be chained.

If you see LegacyRowsSkipped increase between two runs, something has written NULL-hash rows after the migration — that should never happen via IdentityMesh code. Treat it as a tamper signal and check the engine and API logs.

What “broken” actually looks like

If the verifier reports brokenLinks > 0, take the firstBrokenAuditId, decode the bigint half (the first 8 bytes of the GUID, little-endian), and run:

SELECT TOP 5 *
FROM dbo.IM_AdminAudit  -- or IM_ObjectAudit
WHERE AuditId >= <decoded-id>
ORDER BY AuditId;

The first row in that list is the one that doesn’t reproduce. The two most likely causes:

  1. A column was edited. Compare against any backups or SIEM forwarders for the same WhenUtc; if the SIEM copy hashes differently from the live row, the live row is the corrupted one.
  2. A row was deleted. The firstBrokenAuditId row’s PreviousRowHash will not match the prior row’s RowHash. The gap shows you what got removed.

In both cases, escalate per the incident-response procedure for your organisation. The IR runbook bound to this control treats a non-zero brokenLinks count as a P1 finding.

Cross-references

The internal incident-response runbook (incident-response.md in the operator distribution) references this verifier as the persistence-check primitive in the §6.2 step that confirms attacker activity didn’t get scrubbed from the audit trail.