Skip to content

Local pilot

Walks you from a clean checkout to an end-to-end Coledex backup running on a real provider host (Kuma), settling onchain on Solana devnet, and depositing the encrypted dump in local Minio. A second provider runs as a container on the control plane to exercise multi-host scheduling — no third physical machine required.

Target audience: operator (you), not a developer reading the codebase for the first time.

If anything in this document is out of date relative to deploy/local-pilot/ scripts, the scripts win. They are the machine-checked source of truth.


LAN (e.g. 192.168.0.0/24)
|
+-----------------------+----+
| |
mp1 = this dev box mp2 = kuma (ZimaOS)
control plane real provider
+ simulated provider - openinfra-agent (systemd)
- Postgres (5433) - Docker (containers)
- api-server (8082, 9092)
- registry (5000)
- minio (9000/9001)
- prometheus(9095)
- grafana (3000)
- customer-pg (5434)
- agent-mp3-sim ← second provider as a compose container,
(mounted on the spawns workloads on this box's docker.sock
host's docker.sock)

Both boxes must:

  • Reach each other on the LAN.
  • Resolve mp1’s IP (you’ll set it explicitly — no mDNS dependency).
  • Run Linux + Docker (Kuma already does).

Why a simulated mp3 is OK for the pilot: it lets us exercise multi-host invite issuance, pinned-host scheduling, and SPL distribution across two payout wallets — all the multi-host code paths — without a third box. The simulator shares the control plane’s docker daemon (mounted socket), so workloads it spawns appear as siblings to the api-server itself. Production trivially swaps the sim for a real third machine: set USE_REAL_MP3=1 and SSH the install script. The sim service in compose is harmless to leave running alongside.


On mp1:

Terminal window
# Toolchain — already on this box
go version # 1.25+
docker --version
docker compose version
jq --version
curl --version
# Get your LAN IP (will go everywhere as CONTROL_PLANE_IP)
ip route get 1.1.1.1 | awk '{for(i=1;i<=NF;i++) if($i=="src") print $(i+1)}'

On kuma:

Terminal window
ssh kuma 'docker --version && sudo systemctl is-active docker'

(mp3 is the compose-managed simulator on mp1 — no separate preflight needed.)

Solana payout wallets — create two if you don’t have them:

Terminal window
# Once per provider, on any machine with solana-keygen:
solana-keygen new -o kuma-payout.json --no-bip39-passphrase
solana-keygen new -o mp3-payout.json --no-bip39-passphrase
# Note the public keys — you'll need them for the pilot script:
solana-keygen pubkey kuma-payout.json # → export KUMA_PAYOUT_PUBKEY=...
solana-keygen pubkey mp3-payout.json # → export MP3_PAYOUT_PUBKEY=...

Terminal window
cd ~/Documents/seppelabs/openinfra/deploy/local-pilot
# CONTROL_PLANE_IP is auto-detected; override if your box has VPN
# interfaces and the auto-detect picks the wrong one.
./control-plane-up.sh

What this does:

  1. Builds the openinfra-api Docker image from local source.
  2. docker compose up -d brings up: Postgres, customer-pg, api-server, registry, Minio (+ bootstrap container that creates the bucket), Prometheus, Grafana.
  3. Waits for /healthz on the api-server.
  4. Greps the api-server logs for TREASURY_PUBKEY=… (printed once on first boot when the onchain executor generates the keypair).
  5. Hits the Solana devnet faucet for 1 SOL into the treasury, only if the treasury balance is under 0.1 SOL (don’t burn the faucet on every re-run).
  6. Prints the URL summary + next-step hints.

One-time mint (the script tells you the exact command):

Terminal window
# Use the existing OINFRA-test mint from .devnet-keys/ OR mint into it:
spl-token mint 9Jkq8WdgUUp2AR4FeXwE6q4DRddoHGfcKREMeCE6wphT 1000000 \
--owner $TREASURY_PUBKEY --url devnet

If you don’t have spl-token locally:

Terminal window
docker run --rm -it \
-v ~/.config/solana:/root/.config/solana \
solanalabs/solana:latest \
spl-token mint ... --owner $TREASURY_PUBKEY --url devnet

From mp1, for the real provider only:

Terminal window
./provider-install.sh kuma <CONTROL_PLANE_IP> amd64

What it does:

  1. Cross-compiles the agent for linux/<arch>.
  2. scps the binary to the host.
  3. Installs as /usr/local/bin/openinfra-agent.
  4. Configures Docker to trust the LAN registry at <CONTROL_PLANE_IP>:5000 (insecure HTTP — fine on LAN).
  5. Writes a systemd unit and enables it (does not start — needs /etc/openinfra/agent.env which run-pilot.sh writes).

The simulated mp3 is already running — control-plane-up.sh built its image and started the container as part of docker compose up. Verify with: docker compose ps agent-mp3-sim.

If/when you get a third physical machine, run ./provider-install.sh mp3 <CONTROL_PLANE_IP> and set USE_REAL_MP3=1 before running the pilot — the sim then stays idle (no invite registered against it).


From mp1:

Terminal window
export CONTROL_PLANE_IP=192.168.0.x # your LAN IP
export KUMA_PAYOUT_PUBKEY=... # from §1
export MP3_PAYOUT_PUBKEY=...
# Generate the customer-side backup recipient (Coledex's side in real life):
age-keygen -o backup-customer.key
export BACKUP_AGE_RECIPIENT=$(age-keygen -y backup-customer.key)
echo "$BACKUP_AGE_RECIPIENT" # age1...
./run-pilot.sh

The script walks 8 numbered steps and bails out with a clear error if any step diverges. Expect ~10–15 min end-to-end including the executor’s 5-min onchain settlement cadence.

Success looks like:

================================================================
pilot end-to-end: PASS
================================================================
workload_id: <uuid>
status: succeeded
onchain_settled: 75 (or whatever credits the workload billed)
backup_objects: 1
================================================================

WhatHow
Backup filehttp://mp1:9001 → coledex-backups → see one *.pg.age file
Decryptmc cp local/coledex-backups/<file> /tmp && age --decrypt -i backup-customer.key /tmp/<file> > dump.pg && pg_restore --list dump.pg
Kuma’s onchain balancespl-token accounts --owner $KUMA_PAYOUT_PUBKEY --url devnet
Settlement metricshttp://mp1:3000 → OpenInfra/Local Pilot dashboard
Raw logsdocker compose logs -f api-server on mp1; journalctl -u openinfra-agent -f on kuma

SymptomFix
Sim provider stuck restartingdocker compose logs agent-mp3-sim — usually INVITE_TOKEN missing or already consumed. Re-run ./run-pilot.sh to issue a fresh invite.
Workloads on sim can’t reach customer-pg / minioThey use CONTROL_PLANE_IP in env, not compose service names. Confirm your LAN IP is set correctly in run-pilot.sh.
TREASURY_PUBKEY= missingSOLANA_MINT env was empty — set in docker-compose.yml, recreate api-server: docker compose up -d --force-recreate api-server
Airdrop fails (429)Devnet faucet rate-limit — wait 8 h or manually fund via solana airdrop 1 $TREASURY_PUBKEY --url devnet from another IP
Agent can’t pull image/etc/docker/daemon.json insecure-registry entry missing on provider — re-run provider-install.sh
Workload stuck schedulingAgent not heartbeating — check journalctl -u openinfra-agent on the provider, common cause: /etc/openinfra/agent.env malformed
Onchain never settlesTreasury has 0 SOL (can’t pay fees) or 0 OINFRA (no token to send) — verify with solana balance and spl-token accounts
pilot end-to-end: PASS then nothing in MinioWorkload exited fast but rclone rcat failed silently — check docker logs openinfra-<workload-id> on kuma

Once the pilot passes once, leave it running for 7 days:

Terminal window
# The catalog manifest schedules at 0 3 * * * UTC. Let the cron
# trigger naturally, or force a run every hour for faster signal:
while true; do ./run-pilot.sh && sleep 3600; done

Watch openinfra_settlement_onchain_failed_total on the Grafana dashboard — any non-zero value during the soak is a real bug to fix before mainnet. Backup file count in Minio should match number of successful runs.


Terminal window
cd ~/Documents/seppelabs/openinfra/deploy/local-pilot
docker compose down -v # -v wipes volumes (treasury, ledger, backups)
ssh kuma 'sudo systemctl disable --now openinfra-agent && sudo rm /etc/systemd/system/openinfra-agent.service && sudo systemctl daemon-reload'
# Simulated mp3 is part of the compose stack — `down -v` above
# already removed it; nothing to clean up remotely.

docker compose down (without -v) keeps the volumes so a re-up resumes with the same treasury keypair + ledger state.