Backup and Restore
This document covers what to back up, the order to restore in, and the gotchas that bite when something goes wrong at 2am.
What needs to be backed up
| Item | Location | Notes |
|---|---|---|
| SQL database | The IdentityMesh DB on the configured SQL Server. | Authoritative for connectors, schedules, mesh objects, run history, audit, secret blobs. |
| License file | %CommonApplicationData%\IdentityMesh\license.key (on the host running the service / API). | Signed RSA-4096 license payload. Without it, the service falls back to Starter trial after a reload. |
| Trial marker | %CommonApplicationData%\IdentityMesh\trial-started.dat | Holds the trial-start UTC timestamp. Lose it and the 30-day trial clock restarts (mostly cosmetic if you have a real license). |
| Connector DLLs | Connectors\ subfolder of the relay agent install dir, and the matching folder for the in-process service. | Not in the DB. Replacement at upgrade time, see upgrades.md. |
| Service binaries + appsettings | Service install dir under %ProgramFiles%\IdentityMesh\ (or wherever the MSI deployed to). | Re-installable via the MSI; only the appsettings.{Environment}.json overrides need backup if customized. |
| Relay agent appsettings | %ProgramFiles%\IdentityMeshRelayAgent\appsettings.json on each agent host. | Contains Relay:HubUrl, Relay:AgentId, Relay:ApiKey. Re-keyable but operationally easier to back up. |
| Log files | logs\*.log in each component’s working directory (Serilog rolling daily). | Useful for forensics; not required for service recovery. |
What does NOT survive a restore as-is
The most important caveat: IM_Secrets blobs are DPAPI-encrypted under
the host’s LocalMachine key. They restore intact byte-for-byte but
cannot be decrypted on a different host. See
secrets-and-dpapi.md for full detail.
This means: a SQL backup is sufficient for data recovery on the same
host. For host migration or DR to a different machine, you also
need to re-provision every secret via secretscli set <ref> <value>.
Backup procedure
Daily / scheduled
-
SQL full backup (or differential on top of a recent full):
BACKUP DATABASE IdentityMesh TO DISK = 'D:\Backups\IdentityMesh_full.bak' WITH FORMAT, INIT, COMPRESSION, CHECKSUM;Retain per your standard policy. Run
RESTORE VERIFYONLYagainst the backup file as a smoke test. -
License file:
robocopy "%CommonApplicationData%\IdentityMesh" "D:\Backups\IdentityMesh\config" license.key trial-started.dat -
Relay agent configs (per agent host):
robocopy "%ProgramFiles%\IdentityMeshRelayAgent" "\\backup\share\relay-{agentname}" appsettings.json
One-time / on change
- After installer upgrade: snapshot the
Connectors\folders for both the service and each relay so you can roll back if a connector regresses. - After issuing a new license file: snapshot the new
license.key.
Restore procedure
Same-host recovery (host hasn’t changed)
-
Stop the IdentityMesh services:
sc stop IdentityMeshEngine sc stop IdentityMeshAdminStop relay agent services on remote hosts if they’re going to retry imports against an inconsistent state.
-
Restore SQL:
RESTORE DATABASE IdentityMesh FROM DISK = 'D:\Backups\IdentityMesh_full.bak' WITH REPLACE, RECOVERY; -
Restore the license file to
%CommonApplicationData%\IdentityMesh\if it was lost. -
Start services:
sc start IdentityMeshEngine sc start IdentityMeshAdmin -
Verify via the Admin UI dashboard (run history, license status, connector list) and a smoke-test sync run.
Different-host recovery (host migration / DR)
Same steps as above, plus:
-
After the service is up, every secret in
IM_Secretswill fail to decrypt withInvalidOperationException(post-C3 hardening — used to fail silently). The first sync run that needs an authenticated connector will surface this. -
Re-provision every secret on the new host:
secretscli set secret://ad/svc-imadmin/password "..." secretscli set secret://sql/hr-source/password "..." -
Restart the service after re-provisioning so any cached license / auth material is reloaded.
Restore order matters
Tables have FK chains (mesh objects → relationships, attributes, audit; runs → connector logs). EF migrations create the dependency graph; SQL restore is atomic so order is moot for restore itself. Order only matters for:
-
Selective restore of a subset of tables (don’t): always restore the full DB or use a per-environment refresh script that respects FK order. There is no built-in selective restore tool.
-
Manual deletion of test data:
DELETE /api/system/datain the Admin API clears the operational tables in the correct dependency order.
Recovery objectives
These are the RPO / RTO targets IdentityMesh commits to as a product baseline. They’re grounded in what the current single-instance + shared-SQL architecture actually delivers — not aspirational numbers. Customer-specific SLAs can be tighter, but only by adding backup infrastructure (log shipping, AlwaysOn AG, warm standby) around the product, not by changing the product itself.
RPO (data loss on failure)
RPO is a direct function of your SQL backup cadence. IdentityMesh itself produces no state outside SQL and the license file — both easy to back up frequently.
| Tier | SQL backup strategy | RPO target | Extra setup |
|---|---|---|---|
| Default | Daily full (provided sample in this doc). | ≤ 24 h | None — matches the shipped example. |
| Standard | Daily full + hourly differential. | ≤ 1 h | SQL Agent job for the differential. |
| Enterprise | Daily full + 15-min transaction log backups. | ≤ 15 min | SQL set to FULL recovery model + log backup job. |
| Zero-loss | AlwaysOn Availability Group with synchronous commit to a secondary. | ≈ 0 (committed transactions preserved) | AG cluster, separate network. |
The license file doesn’t contribute to RPO — it only determines tier limits and can be re-issued on request if the file itself is lost.
RTO (time to recover)
RTO is dominated by three serial steps: SQL restore time, service cycle + verification, and (DR only) secret re-provisioning. Restore time scales with database size, so the commitment is tiered.
| Dataset size | Same-host RTO | DR-to-new-host RTO |
|---|---|---|
| ≤ 100k mesh objects | ≤ 1 h | ≤ 2 h |
| ≤ 1M mesh objects | ≤ 2 h | ≤ 4 h |
| ≤ 10M mesh objects | ≤ 4 h | ≤ 8 h |
The DR-to-new-host column includes the mandatory secret-re-provisioning
step — every entry in IM_Secrets must be re-set via secretscli
because DPAPI-encrypted blobs don’t travel between machines (see
secrets-and-dpapi.md). Budget ~30 seconds
per secret for an operator who has the values handy; more if you
have to go fetch them from a password manager first. Rule of thumb:
RTO is size-of-DB plus (seconds_per_secret × secret_count).
What invalidates these targets
These numbers assume:
- The SQL backup itself is intact.
RESTORE VERIFYONLYit at backup time — a corrupt backup turns the entire RTO into “restore to last known-good plus replay”, which can blow past the table. - The license file has been backed up (or you can re-issue one). An unlicensed service runs at Starter limits and may refuse further mesh-object growth depending on the tier.
- Secret values are recoverable. Losing the DPAPI material on the source host and the out-of-band values is an unbounded recovery — prefer a secret manager over “it lives in the operator’s head”.
Validating the targets
See the DR drill checklist below. The drill is the source of truth for whether the RTO commitment actually holds for your deployment — if a drill ran long, update the target, don’t leave the published number aspirational.
DR drill checklist
Run this annually:
- Take a full SQL backup of production.
- Restore to a fresh Windows host (different machine name).
- Install IdentityMesh on the new host pointing at the restored DB.
- Confirm: dashboard loads, license shows valid, connectors list populates.
- Run a sync against a known connector — it will fail authentication on the secret-decrypt step.
- Re-provision the affected secrets via
secretscli set. - Re-run the sync — it should succeed.
- Record wall-clock time for each step (SQL restore, service start, secret re-provisioning, smoke sync). Compare against the RTO table — if the drill ran long, update the published target rather than keeping an aspirational one.
Related docs
secrets-and-dpapi.md— why secrets don’t survive a host change, recovery procedures.upgrades.md— the upgrade-time analog of this document.installer.md— initial install procedure.