Diminuendo
In musical notation, diminuendo denotes a gradual decrease in volume — a controlled recession from complexity toward clarity. The gateway serves an analogous function in the iGentAI architecture: it reduces a distributed system of agent orchestrators, LLM inference engines, workspace filesystems, billing ledgers, and access control policies into a single, coherent WebSocket protocol that any client can consume with a few dozen lines of code. Diminuendo is the sole entry point through which every frontend client — web browser, desktop application, CLI tool — communicates with the AI agent infrastructure. No client ever speaks directly to an agent orchestrator, an LLM inference service, or a workspace filesystem. The gateway is the narrow waist of the entire system: every byte of agent interaction passes through it, and it is responsible for ensuring that each byte arrives authenticated, authorized, rate-limited, persisted, and billed.The Problem
AI coding agents are not request-response APIs. A single user interaction — “refactor this module to use dependency injection” — may produce dozens of real-time events over several minutes: thinking blocks as the model reasons, tool calls as it reads and writes files, terminal output as it runs tests, permission requests when it encounters sensitive operations, and a structured completion with token usage. All of this must stream to the client with sub-100ms latency for interactive use. Beyond the wire protocol, a production gateway must solve several orthogonal problems simultaneously:- Multi-tenant isolation — each organization’s sessions, history, and billing must be strictly separated, with no possibility of cross-tenant data leakage
- Session persistence — users expect to close their laptop, reopen it tomorrow, and resume exactly where they left off, with full conversation history and event replay
- Billing enforcement — credit reservations must be checked before an expensive LLM turn begins, not after
- Role-based access control — owners, admins, and members have different permissions over sessions, team management, and billing
- Connection resilience — clients disconnect, servers restart, agents crash; the system must recover gracefully from all of these without losing data
- Horizontal scalability — the gateway must scale to thousands of concurrent sessions without requiring shared state between instances
The Solution
Diminuendo is a lean, single-binary gateway — approximately 8,000 lines of Effect TS — running on Bun. It uses SQLite in WAL mode for zero-ops persistence: no Postgres cluster, no Redis dependency, no external state store. Each tenant gets its own SQLite database for session metadata; each session gets its own database for conversation history and events. This per-tenant data isolation means horizontal scaling requires nothing more than routing tenants to different gateway instances. The system also includes an automation engine for scheduled runs, heartbeats, and background triage, using the same event pipeline as interactive turns. Migrations are applied automatically on first access. There is no Docker image to pull, no Kubernetes manifest to deploy, no database provisioning step. Clone, install, run.Position in the Platform
Diminuendo sits between frontend clients and three backend services, each responsible for a distinct domain:| Service | Responsibility |
|---|---|
| Podium | Agent orchestration — creates agent instances, manages their lifecycle, routes messages to running agents, streams their output back |
| Ensemble | LLM inference — model selection, token accounting, provider failover, cost estimation |
| Chronicle | Workspace filesystem — content-addressed storage, file versioning, bidirectional sync with local filesystems |
Protocol at a Glance
The Diminuendo wire protocol is a typed, JSON-over-WebSocket protocol with 21 client message types and 51 server event types.21 Client Messages
Session CRUD, turn execution, file access, team management, and connection lifecycle
51 Server Events
Streaming text, tool calls, thinking blocks, terminal output, sandbox lifecycle, billing updates, and more
7-State Machine
Each session transitions through: inactive, activating, ready, running, waiting, deactivating, error
Automation System
Scheduled automations, heartbeats, and inbox-driven background runs
SDKs
Four official client SDKs provide typed wrappers around the wire protocol:TypeScript
Zero-dependency, works in browsers and Node.js/Bun. Promise-based with typed event handlers.
Rust
Tokio-based async client with
Stream for events. Designed for Tauri desktop apps and CLI tools.Python
Asyncio-based client with callback event handling. Suitable for scripting, testing, and notebook integration.
Swift
Actor-based async/await client for macOS and iOS. Codable events with zero third-party dependencies.
Getting Started
Prerequisites
Diminuendo runs on Bun — a JavaScript runtime with native WebSocket and SQLite support. Install it with:Bun 1.0 or later is required.
Clone with Submodules
The repository uses git submodules for its backend service dependencies (Podium, Ensemble, Chronicle). Clone with
--recurse-submodules to pull them in a single step:Start the Gateway
developer@example.com in the dev tenant. No Auth0 configuration, no JWT tokens, no environment variables required. The gateway creates a ./data directory on first run containing SQLite databases for tenant registries and session data. This directory is gitignored — delete it at any time to reset all state.In production, the
welcome message will have requiresAuth: true. You must send an authenticate message with a valid JWT before the gateway will accept any other messages. In dev mode, this step is skipped entirely.