Performance

Performance in a gateway is not a vanity metric — it is a constraint that shapes what the system above it can become. Every millisecond of gateway overhead is a millisecond stolen from the user’s perception of agent responsiveness. When a coding agent streams text deltas at sub-100ms intervals, the gateway must contribute negligible latency or the illusion of real-time interaction collapses. Diminuendo was not optimized after the fact. Its performance characteristics are a natural consequence of three architectural decisions made at the outset: an in-process runtime with native WebSocket support, in-process SQLite persistence with zero network hops, and a single-binary deployment with no middleware stack. This page presents measured results, not projections.

Multi-Environment Benchmarks

To validate that Diminuendo’s performance advantage is not an artifact of a single machine or operating system, benchmarks are now run on two distinct environments. Results are consistent across both — the architectural advantage is structural, not circumstantial.

devbox (EC2)

Service	Port	Notes
Podium Gateway	:5083	Shared — both gateways route here
Podium Coordinator	:5082	Shared
Ensemble	:5180	Shared
Crescendo	:5000	Next.js 16.1.6 on Bun (Turbopack dev)
Diminuendo	:8090	Bun + Effect TS

Platform: Amazon Linux 2023, x86_64, Bun 1.3.10

local (macOS arm64)

Service	Port	Notes
Podium Gateway	:5083	Shared — both gateways route here
Podium Coordinator	:5082	Shared
Ensemble	:5180	Shared
Crescendo	:8002	Next.js on Bun (dev/turbo)
Diminuendo	:8080	Bun + Effect TS

Platform: macOS, arm64, Bun 1.3.10

Both environments share the same backend services (Ensemble, Podium). 10 warmup iterations are discarded before measurement begins to eliminate JIT and cache cold-start effects. Since agent processing time is constant across both gateways, the measured delta is the gateway overhead — nothing else.

Health Endpoint

The health endpoint is the simplest possible probe: receive an HTTP request, check upstream availability, return a JSON response. It isolates the gateway’s per-request overhead with no session state, no authentication, and no database access.

100 iterations, 10 warmup

devbox (EC2 x86_64)
local (macOS arm64)

	Diminuendo	Crescendo	Speedup
p50	0.5 ms	7.8 ms	15.6x faster
p95	1.1 ms	11.0 ms	10.0x faster
p99	1.6 ms	14.8 ms	9.3x faster
mean	0.6 ms	8.3 ms	13.8x faster
stddev	0.3 ms	1.7 ms	5.7x tighter
RPS	12,456	153	81.4x throughput

	Diminuendo	Crescendo	Speedup
p50	0.6 ms	5.0 ms	8.4x faster
p95	1.1 ms	7.5 ms	6.8x faster
p99	1.4 ms	10.3 ms	7.3x faster
mean	0.7 ms	5.6 ms	8.0x faster
stddev	0.3 ms	1.6 ms	5.3x tighter
RPS	10,390	291	35.7x throughput

On the devbox, the gap widens further — Diminuendo’s throughput advantage grows from 35.7x to 81.4x, and the p50 latency speedup jumps from 8.4x to 15.6x. Crescendo’s per-request overhead scales worse on EC2’s x86_64 architecture, while Diminuendo remains sub-millisecond regardless of environment. This confirms that the performance delta is architectural, not platform-specific. Crescendo checks four dependencies (PostgreSQL, Redis, Ensemble, Podium). Diminuendo checks two (Ensemble, Podium). Even accounting for two fewer sub-millisecond probes, the dominant cost is Next.js per-request middleware and routing overhead — a tax paid on every request regardless of handler complexity.

Connection and Authentication

Measures the time from WebSocket upgrade to authenticated identity. This is the latency a user experiences between clicking “connect” and seeing the application become interactive.

20 iterations

	Diminuendo	Crescendo	Speedup
p50	0.4 ms	5.5 ms	15.7x faster
p95	0.5 ms	8.5 ms	17.0x faster

Diminuendo establishes a WebSocket and auto-authenticates in dev mode with zero I/O — the identity is synthesized in-process. Crescendo sends POST /api/e2e/seed which requires a PostgreSQL upsert round-trip, adding network latency and serialization overhead that Diminuendo avoids entirely.

Session Creation

The most demanding gateway operation short of a full agent turn: parse the request, validate permissions, generate a UUID, write to the registry database, construct the response, and broadcast a tenant-wide notification.

50 iterations, 10 warmup

	Diminuendo	Crescendo	Speedup
p50	0.6 ms	17.7 ms	27.6x faster
p95	0.9 ms	24.8 ms	27.6x faster
p99	0.9 ms	51.9 ms	57.7x faster
mean	0.7 ms	19.1 ms	27.3x faster
stddev	0.1 ms	8.9 ms	89x less variance
min	0.5 ms	10.9 ms
max	0.9 ms	75.9 ms

The variance tells the story as clearly as the median. Diminuendo’s standard deviation of 0.1 ms reflects the deterministic cost of an in-process SQLite write. Crescendo’s 8.9 ms standard deviation — with a max of 75.9 ms — reflects the inherent jitter of PostgreSQL network round-trips, Redis publish fan-out, and garbage collection pauses in the Next.js runtime.

Crescendo’s p99 of 51.9 ms means that 1 in 100 session creations takes longer than Diminuendo’s entire p99 response time multiplied by 57. For an interactive application where users create sessions frequently, this tail latency is perceptible.

Summary

Metric	Diminuendo	Crescendo (local)	Crescendo (devbox)	Advantage
Health p50	0.5–0.6 ms	5.0 ms	7.8 ms	8.4–15.6x faster
Health RPS	10,390–12,456	291	153	35.7–81.4x throughput
Auth/connect p50	0.4 ms	5.5 ms	—	15.7x faster
Session create p50	0.6 ms	17.7 ms	—	27.6x faster
Session create p95	0.9 ms	24.8 ms	—	27.6x faster
Session create jitter	0.1 ms stddev	8.9 ms stddev	—	89x less variance

Why Diminuendo Is Faster

These results are not the product of micro-optimization. They are the structural consequence of four architectural decisions, each of which removes an entire category of overhead:

Bun-Native Runtime

Bun’s native HTTP server and Effect TS replace the Next.js middleware stack — route matching, cookie parsing, CSRF middleware, session middleware, API route resolution. Each eliminated layer saves 0.5–2 ms per request. The cumulative effect is the ~4 ms gap visible in the health endpoint results.

Persistent WebSocket Transport

A WebSocket connection is established once and reused for the lifetime of the session. There is no per-request TCP handshake, no TLS renegotiation, no cookie parsing, no session lookup. Authentication cost is amortized to zero after the initial connect.

In-Process SQLite

Every database operation is a function call into bun:sqlite — no TCP connection, no query serialization, no result deserialization, no connection pool management. A prepared-statement INSERT executes in microseconds. The 10–15 ms gap in session creation reflects the PostgreSQL network round-trip that Diminuendo avoids entirely.

In-Process Pub/Sub

Bun’s built-in WebSocket pub/sub publishes to topic subscribers within the same process. No Redis PUBLISH, no serialization to the wire, no deserialization on the subscriber side. Each avoided hop saves 1–2 ms per broadcast event.

Raw Backend Baselines

Direct health-check latency to the shared backend services (50 iterations), provided for reference. These represent the floor — all gateway overhead is additive on top.

Backend	p50	p95
Podium	0.37 ms	0.76 ms
Ensemble	0.24 ms	0.39 ms

Diminuendo’s health endpoint (0.6 ms p50) adds approximately 0.2 ms of gateway overhead on top of the slowest backend probe. Crescendo’s health endpoint (5.0 ms p50) adds approximately 4.6 ms — the cost of the Next.js request pipeline.

Event Streaming Throughput

During an active agent turn, the gateway’s critical path is the event pipeline: receive a Podium event over WebSocket, map it to the client protocol, assign a sequence number, broadcast to all session subscribers, and optionally persist to SQLite. This pipeline processes hundreds of events per second during code generation — each text_delta carrying a few tokens of output. The two-worker SQLite architecture ensures that persistence never blocks the event pipeline:

Component	Throughput	Bottleneck
Event mapping	~100,000 events/sec	CPU (JSON parse + construct)
Bun pub/sub broadcast	~50,000 publishes/sec	Memory bandwidth
SQLite writer (batched)	~5,000 writes/sec	fsync (WAL checkpoint)
SQLite reader	Concurrent with writer	WAL snapshot isolation

The writer worker batches commands at 50 ms intervals or 100 commands, whichever comes first. At typical agent output rates (10–50 events/sec), the writer executes a single batched transaction every 50 ms containing all accumulated writes. The reader worker operates on a separate thread with WAL snapshot isolation, so history queries never block event persistence.

Reproducing

Local (macOS arm64)

# Prerequisites: Podium on :5083, Ensemble on :5180, both gateways running
cd ~/Projects/gateway-bench
bun install
bun run bench                            # all scenarios
bun run bench -- --scenarios health      # just health
bun run bench -- --scenarios session-create

devbox (EC2)

ssh dev "cd ~/apps/gateway-bench && PATH=\$HOME/.bun/bin:\$PATH bun run bench"
ssh dev "cd ~/apps/gateway-bench && PATH=\$HOME/.bun/bin:\$PATH bun run bench -- --scenarios health"

The devbox environment reads configuration from ~/apps/gateway-bench/.env. Environment variables CRESCENDO_BASE_URL, DIMINUENDO_BASE_URL, and service ports can be overridden there.

The benchmark script auto-detects whether services are running and starts them if needed. Results are written to stdout in a tabular format suitable for direct inclusion in documentation.

Latest benchmark run: 2026-03-04 — Bun 1.3.10, macOS arm64 + Amazon Linux 2023 x86_64

Diminuendo

Protocol

Clients

Operations

Performance

Performance

Multi-Environment Benchmarks

devbox (EC2)

local (macOS arm64)

Health Endpoint

Connection and Authentication

Session Creation

Summary

Why Diminuendo Is Faster

Bun-Native Runtime

Persistent WebSocket Transport

In-Process SQLite

In-Process Pub/Sub

Raw Backend Baselines

Event Streaming Throughput

Reproducing

Local (macOS arm64)

devbox (EC2)

Diminuendo

Protocol

Clients

Operations

​Performance

​Multi-Environment Benchmarks

​devbox (EC2)

​local (macOS arm64)

​Health Endpoint

​Connection and Authentication

​Session Creation

​Summary

​Why Diminuendo Is Faster

Bun-Native Runtime

Persistent WebSocket Transport

In-Process SQLite

In-Process Pub/Sub

​Raw Backend Baselines

​Event Streaming Throughput

​Reproducing

​Local (macOS arm64)

​devbox (EC2)

Performance

Multi-Environment Benchmarks

devbox (EC2)

local (macOS arm64)

Health Endpoint

Connection and Authentication

Session Creation

Summary

Why Diminuendo Is Faster

Raw Backend Baselines

Event Streaming Throughput

Reproducing

Local (macOS arm64)

devbox (EC2)