Backup and Restore

This document covers what to back up, the order to restore in, and the gotchas that bite when something goes wrong at 2am.

What needs to be backed up

Item	Location	Notes
SQL database	The `IdentityMesh` DB on the configured SQL Server.	Authoritative for connectors, schedules, mesh objects, run history, audit, secret blobs.
License file	`%CommonApplicationData%\IdentityMesh\license.key` (on the host running the service / API).	Signed RSA-4096 license payload. Without it, the service falls back to Starter trial after a reload.
Trial marker	`%CommonApplicationData%\IdentityMesh\trial-started.dat`	Holds the trial-start UTC timestamp. Lose it and the 30-day trial clock restarts (mostly cosmetic if you have a real license).
Connector DLLs	`Connectors\` subfolder of the relay agent install dir, and the matching folder for the in-process service.	Not in the DB. Replacement at upgrade time, see `upgrades.md`.
Service binaries + appsettings	Service install dir under `%ProgramFiles%\IdentityMesh\` (or wherever the MSI deployed to).	Re-installable via the MSI; only the `appsettings.{Environment}.json` overrides need backup if customized.
Relay agent appsettings	`%ProgramFiles%\IdentityMeshRelayAgent\appsettings.json` on each agent host.	Contains `Relay:HubUrl`, `Relay:AgentId`, `Relay:ApiKey`. Re-keyable but operationally easier to back up.
Log files	`logs\*.log` in each component’s working directory (Serilog rolling daily).	Useful for forensics; not required for service recovery.

What does NOT survive a restore as-is

The most important caveat: IM_Secrets blobs are DPAPI-encrypted under the host’s LocalMachine key. They restore intact byte-for-byte but cannot be decrypted on a different host. See secrets-and-dpapi.md for full detail.

This means: a SQL backup is sufficient for data recovery on the same host. For host migration or DR to a different machine, you also need to re-provision every secret via secretscli set <ref> <value>.

Backup procedure

Daily / scheduled

SQL full backup (or differential on top of a recent full):
```
BACKUP DATABASE IdentityMesh
    TO DISK = 'D:\Backups\IdentityMesh_full.bak'
    WITH FORMAT, INIT, COMPRESSION, CHECKSUM;
```
Retain per your standard policy. Run RESTORE VERIFYONLY against the backup file as a smoke test.

License file:

robocopy "%CommonApplicationData%\IdentityMesh" "D:\Backups\IdentityMesh\config" license.key trial-started.dat

Relay agent configs (per agent host):

robocopy "%ProgramFiles%\IdentityMeshRelayAgent" "\\backup\share\relay-{agentname}" appsettings.json

One-time / on change

After installer upgrade: snapshot the Connectors\ folders for both the service and each relay so you can roll back if a connector regresses.
After issuing a new license file: snapshot the new license.key.

Restore procedure

Same-host recovery (host hasn’t changed)

Stop the IdentityMesh services:
```
sc stop IdentityMeshEngine
sc stop IdentityMeshAdmin
```
Stop relay agent services on remote hosts if they’re going to retry imports against an inconsistent state.

Restore SQL:

RESTORE DATABASE IdentityMesh
    FROM DISK = 'D:\Backups\IdentityMesh_full.bak'
    WITH REPLACE, RECOVERY;

Restore the license file to %CommonApplicationData%\IdentityMesh\ if it was lost.

Start services:

sc start IdentityMeshEngine
sc start IdentityMeshAdmin

Verify via the Admin UI dashboard (run history, license status, connector list) and a smoke-test sync run.

Different-host recovery (host migration / DR)

Same steps as above, plus:

After the service is up, every secret in IM_Secrets will fail to decrypt with InvalidOperationException (post-C3 hardening — used to fail silently). The first sync run that needs an authenticated connector will surface this.

Re-provision every secret on the new host:

secretscli set secret://ad/svc-imadmin/password "..."
secretscli set secret://sql/hr-source/password "..."

Restart the service after re-provisioning so any cached license / auth material is reloaded.

Restore order matters

Tables have FK chains (mesh objects → relationships, attributes, audit; runs → connector logs). EF migrations create the dependency graph; SQL restore is atomic so order is moot for restore itself. Order only matters for:

Selective restore of a subset of tables (don’t): always restore the full DB or use a per-environment refresh script that respects FK order. There is no built-in selective restore tool.
Manual deletion of test data: DELETE /api/system/data in the Admin API clears the operational tables in the correct dependency order.

Recovery objectives

These are the RPO / RTO targets IdentityMesh commits to as a product baseline. They’re grounded in what the current single-instance + shared-SQL architecture actually delivers — not aspirational numbers. Customer-specific SLAs can be tighter, but only by adding backup infrastructure (log shipping, AlwaysOn AG, warm standby) around the product, not by changing the product itself.

RPO (data loss on failure)

RPO is a direct function of your SQL backup cadence. IdentityMesh itself produces no state outside SQL and the license file — both easy to back up frequently.

Tier	SQL backup strategy	RPO target	Extra setup
Default	Daily full (provided sample in this doc).	≤ 24 h	None — matches the shipped example.
Standard	Daily full + hourly differential.	≤ 1 h	SQL Agent job for the differential.
Enterprise	Daily full + 15-min transaction log backups.	≤ 15 min	SQL set to FULL recovery model + log backup job.
Zero-loss	AlwaysOn Availability Group with synchronous commit to a secondary.	≈ 0 (committed transactions preserved)	AG cluster, separate network.

The license file doesn’t contribute to RPO — it only determines tier limits and can be re-issued on request if the file itself is lost.

RTO (time to recover)

RTO is dominated by three serial steps: SQL restore time, service cycle + verification, and (DR only) secret re-provisioning. Restore time scales with database size, so the commitment is tiered.

Dataset size	Same-host RTO	DR-to-new-host RTO
≤ 100k mesh objects	≤ 1 h	≤ 2 h
≤ 1M mesh objects	≤ 2 h	≤ 4 h
≤ 10M mesh objects	≤ 4 h	≤ 8 h

The DR-to-new-host column includes the mandatory secret-re-provisioning step — every entry in IM_Secrets must be re-set via secretscli because DPAPI-encrypted blobs don’t travel between machines (see secrets-and-dpapi.md). Budget ~30 seconds per secret for an operator who has the values handy; more if you have to go fetch them from a password manager first. Rule of thumb: RTO is size-of-DB plus (seconds_per_secret × secret_count).

What invalidates these targets

These numbers assume:

The SQL backup itself is intact. RESTORE VERIFYONLY it at backup time — a corrupt backup turns the entire RTO into “restore to last known-good plus replay”, which can blow past the table.
The license file has been backed up (or you can re-issue one). An unlicensed service runs at Starter limits and may refuse further mesh-object growth depending on the tier.
Secret values are recoverable. Losing the DPAPI material on the source host and the out-of-band values is an unbounded recovery — prefer a secret manager over “it lives in the operator’s head”.

Validating the targets

See the DR drill checklist below. The drill is the source of truth for whether the RTO commitment actually holds for your deployment — if a drill ran long, update the target, don’t leave the published number aspirational.

DR drill checklist

Run this annually:

Take a full SQL backup of production.
Restore to a fresh Windows host (different machine name).
Install IdentityMesh on the new host pointing at the restored DB.
Confirm: dashboard loads, license shows valid, connectors list populates.
Run a sync against a known connector — it will fail authentication on the secret-decrypt step.
Re-provision the affected secrets via secretscli set.
Re-run the sync — it should succeed.
Record wall-clock time for each step (SQL restore, service start, secret re-provisioning, smoke sync). Compare against the RTO table — if the drill ran long, update the published target rather than keeping an aspirational one.

secrets-and-dpapi.md — why secrets don’t survive a host change, recovery procedures.
upgrades.md — the upgrade-time analog of this document.
installer.md — initial install procedure.