Performance

Performance in a gateway is not a vanity metric — it is a constraint that shapes what the system above it can become. Every millisecond of gateway overhead is a millisecond stolen from the user’s perception of agent responsiveness. When a coding agent streams text deltas at sub-100ms intervals, the gateway must contribute negligible latency or the illusion of real-time interaction collapses. Diminuendo was not optimized after the fact. Its performance characteristics are a natural consequence of three architectural decisions made at the outset: an in-process runtime with native WebSocket support, in-process SQLite persistence with zero network hops, and a single-binary deployment with no middleware stack. This page presents measured results, not projections.

Multi-Environment Benchmarks

To validate that Diminuendo’s performance advantage is not an artifact of a single machine or operating system, benchmarks are now run on two distinct environments. Results are consistent across both — the architectural advantage is structural, not circumstantial.

devbox (EC2)

ServicePortNotes
Podium Gateway:5083Shared — both gateways route here
Podium Coordinator:5082Shared
Ensemble:5180Shared
Crescendo:5000Next.js 16.1.6 on Bun (Turbopack dev)
Diminuendo:8090Bun + Effect TS
Platform: Amazon Linux 2023, x86_64, Bun 1.3.10

local (macOS arm64)

ServicePortNotes
Podium Gateway:5083Shared — both gateways route here
Podium Coordinator:5082Shared
Ensemble:5180Shared
Crescendo:8002Next.js on Bun (dev/turbo)
Diminuendo:8080Bun + Effect TS
Platform: macOS, arm64, Bun 1.3.10
Both environments share the same backend services (Ensemble, Podium). 10 warmup iterations are discarded before measurement begins to eliminate JIT and cache cold-start effects. Since agent processing time is constant across both gateways, the measured delta is the gateway overhead — nothing else.

Health Endpoint

The health endpoint is the simplest possible probe: receive an HTTP request, check upstream availability, return a JSON response. It isolates the gateway’s per-request overhead with no session state, no authentication, and no database access.
100 iterations, 10 warmup
DiminuendoCrescendoSpeedup
p500.5 ms7.8 ms15.6x faster
p951.1 ms11.0 ms10.0x faster
p991.6 ms14.8 ms9.3x faster
mean0.6 ms8.3 ms13.8x faster
stddev0.3 ms1.7 ms5.7x tighter
RPS12,45615381.4x throughput
On the devbox, the gap widens further — Diminuendo’s throughput advantage grows from 35.7x to 81.4x, and the p50 latency speedup jumps from 8.4x to 15.6x. Crescendo’s per-request overhead scales worse on EC2’s x86_64 architecture, while Diminuendo remains sub-millisecond regardless of environment. This confirms that the performance delta is architectural, not platform-specific. Crescendo checks four dependencies (PostgreSQL, Redis, Ensemble, Podium). Diminuendo checks two (Ensemble, Podium). Even accounting for two fewer sub-millisecond probes, the dominant cost is Next.js per-request middleware and routing overhead — a tax paid on every request regardless of handler complexity.

Connection and Authentication

Measures the time from WebSocket upgrade to authenticated identity. This is the latency a user experiences between clicking “connect” and seeing the application become interactive.
20 iterations
DiminuendoCrescendoSpeedup
p500.4 ms5.5 ms15.7x faster
p950.5 ms8.5 ms17.0x faster
Diminuendo establishes a WebSocket and auto-authenticates in dev mode with zero I/O — the identity is synthesized in-process. Crescendo sends POST /api/e2e/seed which requires a PostgreSQL upsert round-trip, adding network latency and serialization overhead that Diminuendo avoids entirely.

Session Creation

The most demanding gateway operation short of a full agent turn: parse the request, validate permissions, generate a UUID, write to the registry database, construct the response, and broadcast a tenant-wide notification.
50 iterations, 10 warmup
DiminuendoCrescendoSpeedup
p500.6 ms17.7 ms27.6x faster
p950.9 ms24.8 ms27.6x faster
p990.9 ms51.9 ms57.7x faster
mean0.7 ms19.1 ms27.3x faster
stddev0.1 ms8.9 ms89x less variance
min0.5 ms10.9 ms
max0.9 ms75.9 ms
The variance tells the story as clearly as the median. Diminuendo’s standard deviation of 0.1 ms reflects the deterministic cost of an in-process SQLite write. Crescendo’s 8.9 ms standard deviation — with a max of 75.9 ms — reflects the inherent jitter of PostgreSQL network round-trips, Redis publish fan-out, and garbage collection pauses in the Next.js runtime.
Crescendo’s p99 of 51.9 ms means that 1 in 100 session creations takes longer than Diminuendo’s entire p99 response time multiplied by 57. For an interactive application where users create sessions frequently, this tail latency is perceptible.

Summary

MetricDiminuendoCrescendo (local)Crescendo (devbox)Advantage
Health p500.5–0.6 ms5.0 ms7.8 ms8.4–15.6x faster
Health RPS10,390–12,45629115335.7–81.4x throughput
Auth/connect p500.4 ms5.5 ms15.7x faster
Session create p500.6 ms17.7 ms27.6x faster
Session create p950.9 ms24.8 ms27.6x faster
Session create jitter0.1 ms stddev8.9 ms stddev89x less variance

Why Diminuendo Is Faster

These results are not the product of micro-optimization. They are the structural consequence of four architectural decisions, each of which removes an entire category of overhead:

Bun-Native Runtime

Bun’s native HTTP server and Effect TS replace the Next.js middleware stack — route matching, cookie parsing, CSRF middleware, session middleware, API route resolution. Each eliminated layer saves 0.5–2 ms per request. The cumulative effect is the ~4 ms gap visible in the health endpoint results.

Persistent WebSocket Transport

A WebSocket connection is established once and reused for the lifetime of the session. There is no per-request TCP handshake, no TLS renegotiation, no cookie parsing, no session lookup. Authentication cost is amortized to zero after the initial connect.

In-Process SQLite

Every database operation is a function call into bun:sqlite — no TCP connection, no query serialization, no result deserialization, no connection pool management. A prepared-statement INSERT executes in microseconds. The 10–15 ms gap in session creation reflects the PostgreSQL network round-trip that Diminuendo avoids entirely.

In-Process Pub/Sub

Bun’s built-in WebSocket pub/sub publishes to topic subscribers within the same process. No Redis PUBLISH, no serialization to the wire, no deserialization on the subscriber side. Each avoided hop saves 1–2 ms per broadcast event.

Raw Backend Baselines

Direct health-check latency to the shared backend services (50 iterations), provided for reference. These represent the floor — all gateway overhead is additive on top.
Backendp50p95
Podium0.37 ms0.76 ms
Ensemble0.24 ms0.39 ms
Diminuendo’s health endpoint (0.6 ms p50) adds approximately 0.2 ms of gateway overhead on top of the slowest backend probe. Crescendo’s health endpoint (5.0 ms p50) adds approximately 4.6 ms — the cost of the Next.js request pipeline.

Event Streaming Throughput

During an active agent turn, the gateway’s critical path is the event pipeline: receive a Podium event over WebSocket, map it to the client protocol, assign a sequence number, broadcast to all session subscribers, and optionally persist to SQLite. This pipeline processes hundreds of events per second during code generation — each text_delta carrying a few tokens of output. The two-worker SQLite architecture ensures that persistence never blocks the event pipeline:
ComponentThroughputBottleneck
Event mapping~100,000 events/secCPU (JSON parse + construct)
Bun pub/sub broadcast~50,000 publishes/secMemory bandwidth
SQLite writer (batched)~5,000 writes/secfsync (WAL checkpoint)
SQLite readerConcurrent with writerWAL snapshot isolation
The writer worker batches commands at 50 ms intervals or 100 commands, whichever comes first. At typical agent output rates (10–50 events/sec), the writer executes a single batched transaction every 50 ms containing all accumulated writes. The reader worker operates on a separate thread with WAL snapshot isolation, so history queries never block event persistence.

Reproducing

Local (macOS arm64)

# Prerequisites: Podium on :5083, Ensemble on :5180, both gateways running
cd ~/Projects/gateway-bench
bun install
bun run bench                            # all scenarios
bun run bench -- --scenarios health      # just health
bun run bench -- --scenarios session-create

devbox (EC2)

ssh dev "cd ~/apps/gateway-bench && PATH=\$HOME/.bun/bin:\$PATH bun run bench"
ssh dev "cd ~/apps/gateway-bench && PATH=\$HOME/.bun/bin:\$PATH bun run bench -- --scenarios health"
The devbox environment reads configuration from ~/apps/gateway-bench/.env. Environment variables CRESCENDO_BASE_URL, DIMINUENDO_BASE_URL, and service ports can be overridden there.
The benchmark script auto-detects whether services are running and starts them if needed. Results are written to stdout in a tabular format suitable for direct inclusion in documentation.
Latest benchmark run: 2026-03-04 — Bun 1.3.10, macOS arm64 + Amazon Linux 2023 x86_64