Security model
This document describes the trust boundaries of the OpenInfra control plane + agent system, the assets it protects, the threats against each boundary, and the mitigations in place today. It is deliberately honest about residual and accepted risks — a threat model that only lists wins is not useful for the next person hardening the system.
Scope: the live worldwide deployment shape — one control plane on a
public VPS (cp.seppelabs.com), provider agents that dial out from
untrusted networks, and tenants that submit container workloads over
a public REST API. See deploy/seppelabs/README.md for the deployment
runbook and ARCHITECTURE.md for the full system design.
1. System overview & trust boundaries
Section titled “1. System overview & trust boundaries” ┌─────────────────────── PUBLIC INTERNET ───────────────────────┐ │ │ Tenant ───┼── HTTPS (Caddy TLS) ──▶ :443 ┐ │ (API key) │ │ │ │ ┌──────▼─────── control plane (VPS) ──┐│ Agent ────┼── gRPC mTLS ──────────▶ :9090 │ api-server (chi REST :8080) ││ (client │ (outbound dial) │ │ gRPC server :9090 ││ cert) │ │ │ postgres (compose net only)││ │ │ │ registry (compose net only)││ │ │ │ pg-backup ──▶ Backblaze B2 ││ └──────────────────────────┘ └───────────────────────────────┘│ │ Provider host (e.g. Kuma, home box) runs the agent + tenant containers ──────┘ │ docker.sock mounted │ tenant images pulled & executed hereTrust boundaries (where data crosses a privilege level):
| # | Boundary | Crossing | Authentication |
|---|---|---|---|
| B1 | Tenant → control plane | public HTTPS REST | API key (oi_…) or admin JWT |
| B2 | Agent → control plane | public gRPC | mutual TLS (client cert) |
| B3 | Control plane → provider host | workload dispatch over B2’s stream | (rides the mTLS channel) |
| B4 | Tenant workload → provider host kernel | container runtime | OS/container isolation |
| B5 | Control plane → Postgres / registry | compose-internal | not published to host |
| B6 | Control plane → Backblaze B2 | outbound HTTPS | scoped application key |
| B7 | Control plane → Solana devnet | outbound RPC | treasury keypair |
2. Assets
Section titled “2. Assets”What an attacker would want, ranked by blast radius:
- The settlement ledger (
transactions, double-entry). Tampering = theft of credits / unbilled compute. Integrity is paramount. - Tenant secrets injected into workloads (DB URLs, cloud creds in
env_template/ the per-dispatch secrets map). Confidentiality. - API keys & JWT secret. A leaked tenant key spends that tenant’s
credits and runs workloads as them; a leaked
JWT_SECRETforges admin sessions → full control-plane compromise. - The mTLS CA private key. Signs agent client certs; leak = an attacker can register a rogue provider host.
- The Postgres database (everything above at rest).
- The Solana treasury keypair (on-chain settlement authority).
- Provider host integrity — tenants run code on someone else’s box.
3. Boundary-by-boundary threats & mitigations
Section titled “3. Boundary-by-boundary threats & mitigations”B1 — Tenant → control plane (public REST)
Section titled “B1 — Tenant → control plane (public REST)”Authentication. Bearer token. A token that LooksLikeKey (oi_
prefix) is routed to the API-key branch; any failure there (revoked,
expired, suspended tenant) returns 401 and does not fall through to
the JWT path (internal/api/middleware/apikey.go, AuthEither). This
closes the “present a stolen API key as a JWT” confusion class.
- API keys are 16 bytes from
crypto/rand, formattedoi_{env}_{base32}, stored only as a SHA-256 hash; lookup is by hash, comparison is constant-time (crypto/subtle). Plaintext is shown once at mint time and never persisted (internal/auth/apikey). - Keys carry scopes;
RequireScope(403) narrows machine principals. Admin JWT sessions carry implicit-all — the intended delegation model (humans use the portal with full-power sessions, integrations use least-privilege scoped keys). - Per-tenant isolation:
AuthEitheralways resolvesTenantIDKey, so every handler scopes queries to the caller’s tenant regardless of auth branch. Threat: IDOR / cross-tenant access — mitigated by tenant scoping at the handler layer; this is the single most important invariant to preserve when adding endpoints (see §6 checklist).
Transport. Caddy terminates TLS with an auto-provisioned Let’s Encrypt cert. HTTP→HTTPS redirect; only 80/443/9090 are open on the firewall (Postgres 5432 and registry 5000 are not host-published).
Input handling. Customer-supplied secret maps for batch jobs are
validated by internal/secretrules against a manifest-declared spec
(required keys, pattern/enum/int-range, url/url_list). URL rules can
require an SSRF guard that rejects loopback, link-local, RFC1918,
ULA, and “this network” targets (ssrf.go) — so a tenant cannot coerce
a workload into fetching http://169.254.169.254/… cloud metadata.
Residual / accepted:
- Rate limiting is not yet enforced at the edge. A leaked key or a
hostile tenant can hammer the API.
rules/common/security.mdcalls for per-endpoint rate limiting; this is an open item (see §7). - DoS via expensive parsing on unauthenticated endpoints. The public
/registerhandler callsmail.ParseAddress;govulncheckflagged GO-2025-4006 (CPU-DoS in that function) as reachable. Mitigated by pinning the build toolchain to a patched Go (go.mod→toolchain go1.25.11) and enforcing it in CI (govulncheckjob). The live binary clears this on its next rebuild+redeploy (Dockerfilefloatsgolang:1.25-alpineto the latest patch).
B2 — Agent → control plane (gRPC, mutual TLS)
Section titled “B2 — Agent → control plane (gRPC, mutual TLS)”The agent dials out; the control plane never initiates a connection
to a provider host, so providers need no inbound exposure or VPN. The
channel is mutual TLS: the agent verifies the server against the
openinfra-server SAN (so the public hostname need not be in the cert),
and the server verifies the agent’s client cert against the project CA
(internal/certs, cmd/gen-certs). Onboarding may additionally require
an invite token.
Residual / accepted:
- CA key custody. The CA private key signs all agent certs. It is
gitignored and lives only in
deploy/seppelabs/certs/. Leak = rogue host registration. No HSM; rotation is manual. Accepted at current scale; revisit before onboarding third-party providers. - No cert revocation list (CRL/OCSP). A compromised agent cert is valid until expiry. Mitigation today is operational (rotate the CA / re-issue). Open item for multi-provider scale.
B3/B4 — Workload execution on the provider host
Section titled “B3/B4 — Workload execution on the provider host”This is the highest-trust boundary in a DePIN system: the provider runs tenant-supplied container images on their own hardware.
- The agent mounts
docker.sockto launch workloads. Tenant code runs in a container, not on the host directly, but container ≠ VM isolation. OPENINFRA_NETWORK_MODEcan place a workload in another container’s network namespace (e.g.container:coledex-tailscale) to reach a private DB. This is powerful and is set by the service manifest (operator-controlled), not by the tenant request.
Residual / accepted:
- Container escape (kernel/runtime 0-day) would compromise the
provider host. Today’s tenants are first-party (Coledex), so this is an
accepted risk. Before running untrusted third-party images, this
boundary needs hardening: rootless/userns runtimes, seccomp/AppArmor
profiles, or gVisor/Kata (VM-isolation;
internal/agent/vmexists as a seam). Do not onboard untrusted tenant images until then. - A workload joining another container’s netns can talk to whatever that
container can reach. Manifests granting
container:network mode are a privilege grant and should be reviewed like one.
B5 — Datastores (Postgres, registry)
Section titled “B5 — Datastores (Postgres, registry)”Neither is published to the host — they exist only on the compose network, reachable solely by the api-server. Compromise of B5 requires first compromising the control-plane container or the box.
Residual: secrets and the ledger live in Postgres in plaintext at rest (no column encryption). Disk-level protection is the VPS provider’s; the off-box backups (B6) are a separate confidentiality surface.
B6 — Off-box backups (Backblaze B2)
Section titled “B6 — Off-box backups (Backblaze B2)”The pg-backup sidecar streams a daily pg_dump to a private B2
bucket via an application key scoped to that one bucket. The micro
keeps no local copy. Optional age encryption (BACKUP_AGE_RECIPIENT)
encrypts dumps before upload; the private key is kept offline.
Residual / accepted:
- Burn-in runs unencrypted for easier restore verification. The dump
contains tenant secrets and the ledger. Enabling
ageencryption is an open item before this is considered hardened (see §7). - B2 key compromise exposes all historical dumps. Scope the key to the bucket (done) and rotate on suspicion.
B7 — On-chain settlement (Solana devnet)
Section titled “B7 — On-chain settlement (Solana devnet)”Settlement currently runs on devnet with a treasury keypair. Loss of the keypair affects devnet settlement only; no mainnet funds are at risk today. Promotion to mainnet is a separate, deliberate hardening exercise (key custody, multisig) — out of scope here.
4. Integrity of the credits ledger
Section titled “4. Integrity of the credits ledger”The ledger is double-entry: every workload posts balanced debits/credits (tenant→suspense, suspense→provider, platform fee). Invariants:
- Settlement writes go through the ledger code path, never ad-hoc SQL.
- A tenant cannot post entries directly — they submit workloads; metering
- settlement are server-side.
- Backups make the ledger recoverable:
deploy/seppelabs/restore-drill.shis the tested path that restores the newest B2 dump into a throwaway Postgres and asserts row counts. An untested backup is not a backup.
5. Secrets handling
Section titled “5. Secrets handling”| Secret | Where | Protection |
|---|---|---|
JWT_SECRET | control-plane env (.env, gitignored) | 32+ random bytes; signs admin sessions |
POSTGRES_PASSWORD | .env | not host-exposed; compose net only |
| Tenant API keys | DB (SHA-256 hash) + tenant’s .secrets/ | hashed at rest; shown once |
| mTLS CA + certs | deploy/seppelabs/certs/ (gitignored) | filesystem perms; manual rotation |
| Tenant workload secrets | dispatched per workload | validated (secretrules); injected as env |
| B2 application key | .env | scoped to the backup bucket |
| Solana treasury key | control-plane volume | devnet only today |
Rules enforced in CI / review:
- No hardcoded secrets in source (see
rules/common/security.md). .envandcerts/are gitignored;.env.exampleships placeholders.- Build toolchain pinned to a patched Go;
govulncheckblocks reachable stdlib/dependency CVEs in CI and at release.
6. Checklist when adding an endpoint or handler
Section titled “6. Checklist when adding an endpoint or handler”Preserve these invariants — they are the load-bearing parts of the model:
- Scope every query to
GetTenantID(ctx); never trust a tenant id from the request body/path (prevents cross-tenant IDOR). - Put machine-facing routes behind
RequireScopewith the least scope that works; don’t reach for admin-only/JWT unless it’s a human portal action. - Validate all external input at the boundary; for any outbound fetch derived from tenant input, apply the SSRF guard.
- Never log secrets, API-key plaintext, or full tokens.
- Return generic auth errors (no “user not found” vs “bad password” oracles).
- Add a test that an unauthenticated / wrong-tenant / wrong-scope caller is rejected.
7. Open security items (tracked)
Section titled “7. Open security items (tracked)”Honest backlog — none are blocking first-party (Coledex) operation, but each is required before the corresponding expansion:
- Edge rate limiting on the public API (per-key + per-IP). Required before exposing self-serve signup widely.
age-encrypt backups (BACKUP_AGE_RECIPIENT) with an offline/hardware-stored key. Required to consider B6 hardened.- Workload isolation hardening (rootless/userns, seccomp/AppArmor, or gVisor/Kata). Required before running untrusted third-party tenant images at B4.
- Agent cert revocation (CRL/OCSP or short-lived certs + rotation). Required before onboarding third-party providers at B2.
- CA key custody (HSM/KMS, documented rotation). Same trigger as #4.
- Alertmanager + paging so the security/health alert rules actually
notify (currently visible-only). See
deploy/seppelabs/alerts.yml. - Mainnet settlement hardening (key custody, multisig) before B7 leaves devnet.
gosecstatic analysis in CI to complementgovulncheck(vuln scanning ≠ static analysis).
Maintained alongside the code. When you change an auth path, a trust boundary, or a secret’s handling, update the relevant section here in the same PR.