Overview
Created tooling for local and CI test environments, automated multi-node network setup, upgrade orchestration, and operational workflows used for release validation and end-to-end testing.
The Problem With Distributed System Tests
Unit tests are necessary but insufficient for distributed systems. A service can pass all its unit tests and still fail in integration because the network is flaky, a peer sends unexpected data, or a migration doesn’t handle concurrent traffic correctly.
The goal was a test suite that could catch these failures before they hit production.
Test Layers
The automation covers six distinct layers:
| Layer | What It Tests |
|---|---|
| Unit | Individual functions and types |
| Integration | Service behaviour against real dependencies (DB, cache) |
| System | Multi-service scenarios, happy paths and error paths |
| Simulation | Injected faults: network partitions, node crashes, slow peers |
| E2E | Full cluster scenarios from client to storage |
| Benchmark | Performance regression detection on hot paths |
Local Multi-Node Setup
A single CLI command spins up a configurable number of nodes locally, bootstraps the peer discovery network, and connects them into a test cluster. This lets engineers run multi-node scenarios on their laptop without needing a staging environment.
testnet up --nodes 5 --scenario replication-under-partition Upgrade Orchestration
Schema migrations and binary upgrades are tested against a running cluster by upgrading nodes one at a time and verifying consistency at each step. This catches the class of bugs where new code makes assumptions about data written by old code.
CI Integration
All layers run in CI. Unit and integration tests run on every PR. System, simulation, and E2E tests run on merge to main. Benchmarks run nightly and alert on regressions exceeding a configurable threshold.
Technical Stack
Go · gomock · Docker Compose · GitHub Actions · testcontainers-go