Diminuendo

In musical notation, diminuendo denotes a gradual decrease in volume — a controlled recession from complexity toward clarity. The gateway serves an analogous function in the iGentAI architecture: it reduces a distributed system of agent orchestrators, LLM inference engines, workspace filesystems, billing ledgers, and access control policies into a single, coherent WebSocket protocol that any client can consume with a few dozen lines of code. Diminuendo is the sole entry point through which every frontend client — web browser, desktop application, CLI tool — communicates with the AI agent infrastructure. No client ever speaks directly to an agent orchestrator, an LLM inference service, or a workspace filesystem. The gateway is the narrow waist of the entire system: every byte of agent interaction passes through it, and it is responsible for ensuring that each byte arrives authenticated, authorized, rate-limited, persisted, and billed.

The Problem

AI coding agents are not request-response APIs. A single user interaction — “refactor this module to use dependency injection” — may produce dozens of real-time events over several minutes: thinking blocks as the model reasons, tool calls as it reads and writes files, terminal output as it runs tests, permission requests when it encounters sensitive operations, and a structured completion with token usage. All of this must stream to the client with sub-100ms latency for interactive use. Beyond the wire protocol, a production gateway must solve several orthogonal problems simultaneously:
  • Multi-tenant isolation — each organization’s sessions, history, and billing must be strictly separated, with no possibility of cross-tenant data leakage
  • Session persistence — users expect to close their laptop, reopen it tomorrow, and resume exactly where they left off, with full conversation history and event replay
  • Billing enforcement — credit reservations must be checked before an expensive LLM turn begins, not after
  • Role-based access control — owners, admins, and members have different permissions over sessions, team management, and billing
  • Connection resilience — clients disconnect, servers restart, agents crash; the system must recover gracefully from all of these without losing data
  • Horizontal scalability — the gateway must scale to thousands of concurrent sessions without requiring shared state between instances
These are not exotic requirements. They are the table stakes of any production-grade AI platform. The question is whether you solve them with a patchwork of middleware, external services, and operational glue — or whether you solve them once, correctly, in a purpose-built gateway.

The Solution

Diminuendo is a lean, single-binary gateway — approximately 8,000 lines of Effect TS — running on Bun. It uses SQLite in WAL mode for zero-ops persistence: no Postgres cluster, no Redis dependency, no external state store. Each tenant gets its own SQLite database for session metadata; each session gets its own database for conversation history and events. This per-tenant data isolation means horizontal scaling requires nothing more than routing tenants to different gateway instances. The system also includes an automation engine for scheduled runs, heartbeats, and background triage, using the same event pipeline as interactive turns. Migrations are applied automatically on first access. There is no Docker image to pull, no Kubernetes manifest to deploy, no database provisioning step. Clone, install, run.

Position in the Platform

Diminuendo sits between frontend clients and three backend services, each responsible for a distinct domain:
                           +------------------+
                           |     Clients      |
                           | Web | Desktop | CLI
                           +--------+---------+
                                    |
                              WebSocket (wss://)
                                    |
                           +--------v---------+
                           |   Diminuendo     |
                           |   (Gateway)      |
                           |                  |
                           |  Auth | Sessions |
                           |  Billing | RBAC  |
                           +--+-----+------+--+
                              |     |      |
                    +---------+  +--+--+   +----------+
                    |            |     |              |
              +-----v----+ +----v---+ +-----v------+
              |  Podium   | |Ensemble| |  Chronicle |
              | (Agents)  | | (LLMs) | |   (Files)  |
              +----------+ +--------+ +------------+
ServiceResponsibility
PodiumAgent orchestration — creates agent instances, manages their lifecycle, routes messages to running agents, streams their output back
EnsembleLLM inference — model selection, token accounting, provider failover, cost estimation
ChronicleWorkspace filesystem — content-addressed storage, file versioning, bidirectional sync with local filesystems
Diminuendo does not contain any AI logic. It does not call LLM APIs directly. It does not manage files. Its sole concern is being an excellent gateway: authenticating clients, managing session lifecycle, routing messages to the correct backend service, streaming events back to all subscribed clients, persisting everything to durable storage, and enforcing billing and access control.

Protocol at a Glance

The Diminuendo wire protocol is a typed, JSON-over-WebSocket protocol with 21 client message types and 51 server event types.

21 Client Messages

Session CRUD, turn execution, file access, team management, and connection lifecycle

51 Server Events

Streaming text, tool calls, thinking blocks, terminal output, sandbox lifecycle, billing updates, and more

7-State Machine

Each session transitions through: inactive, activating, ready, running, waiting, deactivating, error

Automation System

Scheduled automations, heartbeats, and inbox-driven background runs

SDKs

Four official client SDKs provide typed wrappers around the wire protocol:

Getting Started

1

Prerequisites

Diminuendo runs on Bun — a JavaScript runtime with native WebSocket and SQLite support. Install it with:
curl -fsSL https://bun.sh/install | bash
Bun 1.0 or later is required.
2

Clone with Submodules

The repository uses git submodules for its backend service dependencies (Podium, Ensemble, Chronicle). Clone with --recurse-submodules to pull them in a single step:
git clone --recurse-submodules https://github.com/iGentAI/diminuendo.git
cd diminuendo
If you already cloned without --recurse-submodules, initialize them after the fact:
git submodule update --init --recursive
3

Install Dependencies

bun install
4

Start the Gateway

bun run --watch src/main.ts
You should see output similar to:
[info] Config: dev mode enabled
[info] Auth: Dev mode enabled — all requests authenticated as developer@example.com
[info] Gateway listening on 0.0.0.0:8080 (dev mode — auth bypassed)
In dev mode, every WebSocket connection is automatically authenticated as developer@example.com in the dev tenant. No Auth0 configuration, no JWT tokens, no environment variables required. The gateway creates a ./data directory on first run containing SQLite databases for tenant registries and session data. This directory is gitignored — delete it at any time to reset all state.
In production, the welcome message will have requiresAuth: true. You must send an authenticate message with a valid JWT before the gateway will accept any other messages. In dev mode, this step is skipped entirely.