Runbooks / Backup And Restore

Backup and Restore

This document covers what to back up, the order to restore in, and the gotchas that bite when something goes wrong at 2am.

What needs to be backed up

ItemLocationNotes
SQL databaseThe IdentityMesh DB on the configured SQL Server.Authoritative for connectors, schedules, mesh objects, run history, audit, secret blobs.
License file%CommonApplicationData%\IdentityMesh\license.key (on the host running the service / API).Signed RSA-4096 license payload. Without it, the service falls back to Starter trial after a reload.
Trial marker%CommonApplicationData%\IdentityMesh\trial-started.datHolds the trial-start UTC timestamp. Lose it and the 30-day trial clock restarts (mostly cosmetic if you have a real license).
Connector DLLsConnectors\ subfolder of the relay agent install dir, and the matching folder for the in-process service.Not in the DB. Replacement at upgrade time, see upgrades.md.
Service binaries + appsettingsService install dir under %ProgramFiles%\IdentityMesh\ (or wherever the MSI deployed to).Re-installable via the MSI; only the appsettings.{Environment}.json overrides need backup if customized.
Relay agent appsettings%ProgramFiles%\IdentityMeshRelayAgent\appsettings.json on each agent host.Contains Relay:HubUrl, Relay:AgentId, Relay:ApiKey. Re-keyable but operationally easier to back up.
Log fileslogs\*.log in each component’s working directory (Serilog rolling daily).Useful for forensics; not required for service recovery.

What does NOT survive a restore as-is

The most important caveat: IM_Secrets blobs are DPAPI-encrypted under the host’s LocalMachine key. They restore intact byte-for-byte but cannot be decrypted on a different host. See secrets-and-dpapi.md for full detail.

This means: a SQL backup is sufficient for data recovery on the same host. For host migration or DR to a different machine, you also need to re-provision every secret via secretscli set <ref> <value>.

Backup procedure

Daily / scheduled

  1. SQL full backup (or differential on top of a recent full):

    BACKUP DATABASE IdentityMesh
        TO DISK = 'D:\Backups\IdentityMesh_full.bak'
        WITH FORMAT, INIT, COMPRESSION, CHECKSUM;

    Retain per your standard policy. Run RESTORE VERIFYONLY against the backup file as a smoke test.

  2. License file:

    robocopy "%CommonApplicationData%\IdentityMesh" "D:\Backups\IdentityMesh\config" license.key trial-started.dat
  3. Relay agent configs (per agent host):

    robocopy "%ProgramFiles%\IdentityMeshRelayAgent" "\\backup\share\relay-{agentname}" appsettings.json

One-time / on change

Restore procedure

Same-host recovery (host hasn’t changed)

  1. Stop the IdentityMesh services:

    sc stop IdentityMeshEngine
    sc stop IdentityMeshAdmin

    Stop relay agent services on remote hosts if they’re going to retry imports against an inconsistent state.

  2. Restore SQL:

    RESTORE DATABASE IdentityMesh
        FROM DISK = 'D:\Backups\IdentityMesh_full.bak'
        WITH REPLACE, RECOVERY;
  3. Restore the license file to %CommonApplicationData%\IdentityMesh\ if it was lost.

  4. Start services:

    sc start IdentityMeshEngine
    sc start IdentityMeshAdmin
  5. Verify via the Admin UI dashboard (run history, license status, connector list) and a smoke-test sync run.

Different-host recovery (host migration / DR)

Same steps as above, plus:

Restore order matters

Tables have FK chains (mesh objects → relationships, attributes, audit; runs → connector logs). EF migrations create the dependency graph; SQL restore is atomic so order is moot for restore itself. Order only matters for:

Recovery objectives

These are the RPO / RTO targets IdentityMesh commits to as a product baseline. They’re grounded in what the current single-instance + shared-SQL architecture actually delivers — not aspirational numbers. Customer-specific SLAs can be tighter, but only by adding backup infrastructure (log shipping, AlwaysOn AG, warm standby) around the product, not by changing the product itself.

RPO (data loss on failure)

RPO is a direct function of your SQL backup cadence. IdentityMesh itself produces no state outside SQL and the license file — both easy to back up frequently.

TierSQL backup strategyRPO targetExtra setup
DefaultDaily full (provided sample in this doc).≤ 24 hNone — matches the shipped example.
StandardDaily full + hourly differential.≤ 1 hSQL Agent job for the differential.
EnterpriseDaily full + 15-min transaction log backups.≤ 15 minSQL set to FULL recovery model + log backup job.
Zero-lossAlwaysOn Availability Group with synchronous commit to a secondary.≈ 0 (committed transactions preserved)AG cluster, separate network.

The license file doesn’t contribute to RPO — it only determines tier limits and can be re-issued on request if the file itself is lost.

RTO (time to recover)

RTO is dominated by three serial steps: SQL restore time, service cycle + verification, and (DR only) secret re-provisioning. Restore time scales with database size, so the commitment is tiered.

Dataset sizeSame-host RTODR-to-new-host RTO
≤ 100k mesh objects≤ 1 h≤ 2 h
≤ 1M mesh objects≤ 2 h≤ 4 h
≤ 10M mesh objects≤ 4 h≤ 8 h

The DR-to-new-host column includes the mandatory secret-re-provisioning step — every entry in IM_Secrets must be re-set via secretscli because DPAPI-encrypted blobs don’t travel between machines (see secrets-and-dpapi.md). Budget ~30 seconds per secret for an operator who has the values handy; more if you have to go fetch them from a password manager first. Rule of thumb: RTO is size-of-DB plus (seconds_per_secret × secret_count).

What invalidates these targets

These numbers assume:

Validating the targets

See the DR drill checklist below. The drill is the source of truth for whether the RTO commitment actually holds for your deployment — if a drill ran long, update the target, don’t leave the published number aspirational.

DR drill checklist

Run this annually: