Runbooks / Deployment Architecture

Deployment Architecture

This document states explicitly what IdentityMesh’s supported deployment topology is — and, as importantly, what it is not — so operators don’t accidentally deploy in a configuration the code doesn’t honour.

The supported topology

   ┌────────────────────────────────┐
   │  IdentityMesh Admin API        │ ← one instance
   │  (Windows service + Kestrel)   │
   └──────────────┬─────────────────┘
                  │ EF Core, TLS

   ┌────────────────────────────────┐
   │  SQL Server                    │ ← can be clustered /
   │  (IdentityMesh database)       │   AG'd / mirrored, but
   └──────────────┬─────────────────┘   from IdentityMesh's
                  ▲                     perspective it's "the DB"
                  │ EF Core, TLS
   ┌──────────────┴─────────────────┐
   │  IdentityMesh Sync Engine      │ ← one instance
   │  (Windows service)             │
   └──────────────┬─────────────────┘
                  │ SignalR (hub client)

   ┌────────────────────────────────┐
   │  Relay agents (≥ 0)            │ ← one per remote host,
   │  (Windows services)            │   independent
   └────────────────────────────────┘

One Admin API, one Sync Engine, one SQL database, zero-or-more Relay agents. The Admin API and the Sync Engine can share a host or sit on separate hosts — doesn’t matter to the product. Relay agents are designed to be remote.

What’s per-instance vs shared

StateLocationScaled out?
Identity data (IM_MeshObjects, attributes, audit, composer)SQLYes — SQL Server handles the HA story
License (license.key)CommonApplicationData\IdentityMesh\No — per-host file
Secrets (IM_Secrets)SQL (DPAPI-encrypted under the host’s LocalMachine key)Partially — blobs travel, decryption key does not. See secrets-and-dpapi.md.
ASP.NET Core Data Protection keysAdmin API host memory (default provider)NoAddDataProtection().PersistKeysTo… is not wired
SignalR hub state (connection lookups, group membership)Admin API host memoryNo — no Redis / Service Bus backplane
In-memory caches (join rules, composer rules, attribute-flow rules, projection rules, export queue working set)Sync Engine scope memoryNo — each scope is a run
Engine instance registration (the engine-instance registry)SQLTable supports N rows; the scheduler doesn’t read it for coordination yet.

The pattern: SQL holds everything durable; everything ephemeral lives in the instance’s process memory.

Why scale-out isn’t supported today

Three specific gaps, each independently sufficient to break a scale-out deployment:

  1. Data Protection keys aren’t persisted to a shared store. Cookies, antiforgery tokens, and any other data-protected payload minted by one Admin API instance wouldn’t decrypt on a second instance. The moment a sticky-session load balancer fell back to a different instance, the session would break.
  2. SignalR has no backplane. A relay agent connected to instance A would be invisible to instance B. Admin UI operations that broadcast commands to relays (sync triggers, config reloads) from instance B wouldn’t reach the relay.
  3. Engine run coordination isn’t wired. Two sync engine instances pointing at the same SQL database would both schedule the same connectors at the same times — not a correctness bug (each object touches its own watermark + transaction) but wasteful, and composer-rule mutations from two engines could interleave in surprising ways. The the engine-instance registry table exists for a future leader- election implementation; it isn’t consulted today.

Additionally:

Supported failure modes

Not-supported failure modes (today)

What scale-out would take

Not a commitment; an inventory for a future ER that would implement active/active:

For most enterprises the simpler active/passive topology covers the common HA ask — one instance live, a second instance ready to take over via a standard MSCS cluster role or a manual cutover procedure. That sidesteps the distributed-state complexity entirely.

Checking the topology at runtime