Performance
Performance in a gateway is not a vanity metric — it is a constraint that shapes what the system above it can become. Every millisecond of gateway overhead is a millisecond stolen from the user’s perception of agent responsiveness. When a coding agent streams text deltas at sub-100ms intervals, the gateway must contribute negligible latency or the illusion of real-time interaction collapses. Diminuendo was not optimized after the fact. Its performance characteristics are a natural consequence of three architectural decisions made at the outset: an in-process runtime with native WebSocket support, in-process SQLite persistence with zero network hops, and a single-binary deployment with no middleware stack. This page presents measured results, not projections.Multi-Environment Benchmarks
To validate that Diminuendo’s performance advantage is not an artifact of a single machine or operating system, benchmarks are now run on two distinct environments. Results are consistent across both — the architectural advantage is structural, not circumstantial.devbox (EC2)
| Service | Port | Notes |
|---|---|---|
| Podium Gateway | :5083 | Shared — both gateways route here |
| Podium Coordinator | :5082 | Shared |
| Ensemble | :5180 | Shared |
| Crescendo | :5000 | Next.js 16.1.6 on Bun (Turbopack dev) |
| Diminuendo | :8090 | Bun + Effect TS |
local (macOS arm64)
| Service | Port | Notes |
|---|---|---|
| Podium Gateway | :5083 | Shared — both gateways route here |
| Podium Coordinator | :5082 | Shared |
| Ensemble | :5180 | Shared |
| Crescendo | :8002 | Next.js on Bun (dev/turbo) |
| Diminuendo | :8080 | Bun + Effect TS |
Both environments share the same backend services (Ensemble, Podium). 10 warmup iterations are discarded before measurement begins to eliminate JIT and cache cold-start effects. Since agent processing time is constant across both gateways, the measured delta is the gateway overhead — nothing else.
Health Endpoint
The health endpoint is the simplest possible probe: receive an HTTP request, check upstream availability, return a JSON response. It isolates the gateway’s per-request overhead with no session state, no authentication, and no database access.100 iterations, 10 warmup
- devbox (EC2 x86_64)
- local (macOS arm64)
| Diminuendo | Crescendo | Speedup | |
|---|---|---|---|
| p50 | 0.5 ms | 7.8 ms | 15.6x faster |
| p95 | 1.1 ms | 11.0 ms | 10.0x faster |
| p99 | 1.6 ms | 14.8 ms | 9.3x faster |
| mean | 0.6 ms | 8.3 ms | 13.8x faster |
| stddev | 0.3 ms | 1.7 ms | 5.7x tighter |
| RPS | 12,456 | 153 | 81.4x throughput |
Connection and Authentication
Measures the time from WebSocket upgrade to authenticated identity. This is the latency a user experiences between clicking “connect” and seeing the application become interactive.20 iterations
| Diminuendo | Crescendo | Speedup | |
|---|---|---|---|
| p50 | 0.4 ms | 5.5 ms | 15.7x faster |
| p95 | 0.5 ms | 8.5 ms | 17.0x faster |
POST /api/e2e/seed which requires a PostgreSQL upsert round-trip, adding network latency and serialization overhead that Diminuendo avoids entirely.
Session Creation
The most demanding gateway operation short of a full agent turn: parse the request, validate permissions, generate a UUID, write to the registry database, construct the response, and broadcast a tenant-wide notification.50 iterations, 10 warmup
| Diminuendo | Crescendo | Speedup | |
|---|---|---|---|
| p50 | 0.6 ms | 17.7 ms | 27.6x faster |
| p95 | 0.9 ms | 24.8 ms | 27.6x faster |
| p99 | 0.9 ms | 51.9 ms | 57.7x faster |
| mean | 0.7 ms | 19.1 ms | 27.3x faster |
| stddev | 0.1 ms | 8.9 ms | 89x less variance |
| min | 0.5 ms | 10.9 ms | |
| max | 0.9 ms | 75.9 ms |
Summary
| Metric | Diminuendo | Crescendo (local) | Crescendo (devbox) | Advantage |
|---|---|---|---|---|
| Health p50 | 0.5–0.6 ms | 5.0 ms | 7.8 ms | 8.4–15.6x faster |
| Health RPS | 10,390–12,456 | 291 | 153 | 35.7–81.4x throughput |
| Auth/connect p50 | 0.4 ms | 5.5 ms | — | 15.7x faster |
| Session create p50 | 0.6 ms | 17.7 ms | — | 27.6x faster |
| Session create p95 | 0.9 ms | 24.8 ms | — | 27.6x faster |
| Session create jitter | 0.1 ms stddev | 8.9 ms stddev | — | 89x less variance |
Why Diminuendo Is Faster
These results are not the product of micro-optimization. They are the structural consequence of four architectural decisions, each of which removes an entire category of overhead:Bun-Native Runtime
Bun’s native HTTP server and Effect TS replace the Next.js middleware stack — route matching, cookie parsing, CSRF middleware, session middleware, API route resolution. Each eliminated layer saves 0.5–2 ms per request. The cumulative effect is the ~4 ms gap visible in the health endpoint results.
Persistent WebSocket Transport
A WebSocket connection is established once and reused for the lifetime of the session. There is no per-request TCP handshake, no TLS renegotiation, no cookie parsing, no session lookup. Authentication cost is amortized to zero after the initial connect.
In-Process SQLite
Every database operation is a function call into
bun:sqlite — no TCP connection, no query serialization, no result deserialization, no connection pool management. A prepared-statement INSERT executes in microseconds. The 10–15 ms gap in session creation reflects the PostgreSQL network round-trip that Diminuendo avoids entirely.In-Process Pub/Sub
Bun’s built-in WebSocket pub/sub publishes to topic subscribers within the same process. No Redis
PUBLISH, no serialization to the wire, no deserialization on the subscriber side. Each avoided hop saves 1–2 ms per broadcast event.Raw Backend Baselines
Direct health-check latency to the shared backend services (50 iterations), provided for reference. These represent the floor — all gateway overhead is additive on top.| Backend | p50 | p95 |
|---|---|---|
| Podium | 0.37 ms | 0.76 ms |
| Ensemble | 0.24 ms | 0.39 ms |
Event Streaming Throughput
During an active agent turn, the gateway’s critical path is the event pipeline: receive a Podium event over WebSocket, map it to the client protocol, assign a sequence number, broadcast to all session subscribers, and optionally persist to SQLite. This pipeline processes hundreds of events per second during code generation — eachtext_delta carrying a few tokens of output.
The two-worker SQLite architecture ensures that persistence never blocks the event pipeline:
| Component | Throughput | Bottleneck |
|---|---|---|
| Event mapping | ~100,000 events/sec | CPU (JSON parse + construct) |
| Bun pub/sub broadcast | ~50,000 publishes/sec | Memory bandwidth |
| SQLite writer (batched) | ~5,000 writes/sec | fsync (WAL checkpoint) |
| SQLite reader | Concurrent with writer | WAL snapshot isolation |
Reproducing
Local (macOS arm64)
devbox (EC2)
The devbox environment reads configuration from
~/apps/gateway-bench/.env. Environment variables CRESCENDO_BASE_URL, DIMINUENDO_BASE_URL, and service ports can be overridden there.