Runbooks / Reconciliation

Reconciliation report (sync-completeness signal)

IdentityMesh exposes an on-demand reconciliation endpoint that surfaces drift between what the mesh believes and what the most recent successful sync run reported. This is the auditable sync-completeness signal that satisfies SOC 2 Processing Integrity (PI1.x) and ISO 27001 Annex A 8.16: “yesterday’s run touched N objects; current count is M; here’s the difference.”

What it is

A read-only aggregation over existing tables (IM_MeshObjects, IM_ManagementSpaceObjects, IM_RunHistory, IM_Connectors). No new state is written. The endpoint is cheap enough to run on demand against a healthy production database; an operator running it after a DR restore, after a known out-of-band SQL edit, or as a weekly scheduled health check is the supported workflow.

What it detects (phase 1)

  1. Run-vs-mesh count drift — most recent successful run for a connector reported N objects touched (Added + Updated + Deleted); current IM_ManagementSpaceObjects count for that connector is M. The difference (M - N, signed) is the drift since the last sync. Healthy steady state is zero or close to it; non-zero means something changed without going through the engine (a partial subsequent run, an out-of-band SQL edit, manual cleanup) or the mesh was tampered with.
  2. Orphan mesh objects — rows in IM_MeshObjects with ErasedAtUtc IS NULL that have no IM_ManagementSpaceObjects row referencing them. In a healthy deployment every live mesh object is owned by at least one connector-space object; an orphan indicates a partial delete or a broken import.
  3. Dangling connector-space rows — rows in IM_ManagementSpaceObjects whose MeshObjectId is non-null but the referenced mesh object doesn’t exist (or is erased). A dangling row is a broken join — the projection engine will misbehave on it until the broken state is repaired.
  4. Per-object-type breakdown — current counts grouped by ObjectType so an operator can see “12,438 Users / 1,250 Groups” rather than a single number.

What it does NOT detect today (phase 1 limitation)

Drift between mesh state and the upstream source. The original “yesterday’s ServiceNow had 12,438 users; today it has 12,440 — here are the 2 deltas” requires every connector to expose a count-only query method against its source. That’s phase 2; it is documented as a follow-up and tracked separately.

In phase 1, the run-vs-mesh drift signal is a useful proxy: an out-of-band change to either side will show up as drift, even if we can’t yet attribute it to a specific source-system delta.

Endpoints

GET /api/admin/reconciliation — overall report

Returns the full reconciliation report: total counts, orphan + dangling counts, plus a per-connector summary.

curl -X GET https://identitymesh.example.com/api/admin/reconciliation \
     -H "Authorization: Bearer $TOKEN"

Response shape:

{
  "reportedAtUtc": "2026-04-26T14:33:21Z",
  "totalMeshObjects": 13688,
  "totalErasedMeshObjects": 4,
  "orphanMeshObjectCount": 0,
  "danglingConnectorSpaceCount": 0,
  "connectors": [
    {
      "connectorId": "9c1c...",
      "connectorName": "Entra-Production",
      "connectorType": "EntraId",
      "currentMeshObjectCount": 12438,
      "lastRunObjectsTouched": 12438,
      "lastSuccessfulRunUtc": "2026-04-26T03:15:00Z",
      "runVsMeshDrift": 0
    },
    {
      "connectorId": "a234...",
      "connectorName": "AD-OnPrem",
      "connectorType": "ActiveDirectory",
      "currentMeshObjectCount": 1250,
      "lastRunObjectsTouched": 1250,
      "lastSuccessfulRunUtc": "2026-04-26T02:45:00Z",
      "runVsMeshDrift": 0
    }
  ]
}

A drift report (something out of sync) looks like:

{
  "reportedAtUtc": "2026-04-26T14:33:21Z",
  "totalMeshObjects": 13690,
  "totalErasedMeshObjects": 4,
  "orphanMeshObjectCount": 2,
  "danglingConnectorSpaceCount": 1,
  "connectors": [
    {
      "connectorName": "Entra-Production",
      "currentMeshObjectCount": 12440,
      "lastRunObjectsTouched": 12438,
      "runVsMeshDrift": 2
    },
    {
      "connectorName": "AD-OnPrem",
      "currentMeshObjectCount": 1250,
      "lastRunObjectsTouched": 1250,
      "runVsMeshDrift": 0
    }
  ]
}

The runVsMeshDrift: 2 on Entra-Production paired with the orphanMeshObjectCount: 2 is the operator’s lead — there are two mesh rows that exist on this side without a CS pairing on the other.

GET /api/admin/reconciliation/{connectorId} — per-connector detail

Returns the same summary plus a per-ObjectType breakdown and the five most recent runs (any status — useful for spotting a string of failed runs preceding the drift).

curl -X GET https://identitymesh.example.com/api/admin/reconciliation/9c1c... \
     -H "Authorization: Bearer $TOKEN"

Response shape:

{
  "summary": {
    "connectorId": "9c1c...",
    "connectorName": "Entra-Production",
    "connectorType": "EntraId",
    "currentMeshObjectCount": 12440,
    "lastRunObjectsTouched": 12438,
    "lastSuccessfulRunUtc": "2026-04-26T03:15:00Z",
    "runVsMeshDrift": 2
  },
  "byObjectType": [
    { "objectType": "Group", "currentCount": 1250 },
    { "objectType": "User",  "currentCount": 11190 }
  ],
  "recentRuns": [
    { "runId": "...", "status": "Success", "added": 5, "updated": 12433, "deleted": 0 },
    { "runId": "...", "status": "Success", "added": 0, "updated": 12438, "deleted": 0 },
    { "runId": "...", "status": "Failed",  "added": 0, "updated": 0, "deleted": 0 }
  ]
}

Operator workflow

Recommended cadence:

When drift is detected:

  1. Pull the per-connector detail (/api/admin/reconciliation/{id}). Cross-reference recentRuns: a failed run preceding the drift often explains it (the engine started a delta, deleted some tracking rows, then crashed before repopulating them).
  2. Pull the audit log (/api/admin/audit) for the same window — any ErasureRequest rows account for totalErasedMeshObjects moving up; any DBA-tagged actor entries account for out-of-band edits.
  3. To clean up dangling CS rows safely: take a transactional backup, then run a targeted DELETE against the broken rows (the orphan / dangling counts in the report tell you the expected delete count — confirm before commit). Re-run the reconciliation report to confirm the counts dropped to zero.
  4. If drift persists across runs and can’t be attributed to out-of-band edits, treat it as a sync-engine bug — capture the reconciliation snapshot + run history and open a support ticket.

Permissions

PermissionEndpointsRecommended role
audit.readGET /api/admin/reconciliation, GET /api/admin/reconciliation/{id}Operator, Auditor

The reconciliation endpoint is gated on the same audit.read permission as the audit chain verifier (audit-chain.md) and the audit log itself — same audience, same evidence-grade trust level.

Compliance mapping

StandardControlWhat this satisfies
SOC 2PI1.1, PI1.2 (Processing Integrity)Demonstrable sync-completeness signal with on-demand evidence
SOC 2PI1.4, PI1.5Detection of incomplete or inaccurate processing
ISO 27001:2022Annex A 8.16 (Monitoring activities)Continuous monitoring of data-flow integrity
ISO 27001:2022Annex A 8.34 (Protection of information systems during audit testing)Read-only signal safe to invoke during audit

The reconciliation response is itself ephemeral — it’s recomputed on each call and not persisted. For a long-term record, fold the response into your monitoring system’s metrics history (the reportedAtUtc field is the timestamp authoritative for the snapshot).