Diminuendo

In musical notation, diminuendo denotes a gradual decrease in volume — a controlled recession from complexity toward clarity. The gateway serves an analogous function in the iGentAI architecture: it reduces a distributed system of agent orchestrators, LLM inference engines, workspace filesystems, billing ledgers, and access control policies into a single, coherent WebSocket protocol that any client can consume with a few dozen lines of code. Diminuendo is the sole entry point through which every frontend client — web browser, desktop application, CLI tool — communicates with the AI agent infrastructure. No client ever speaks directly to an agent orchestrator, an LLM inference service, or a workspace filesystem. The gateway is the narrow waist of the entire system: every byte of agent interaction passes through it, and it is responsible for ensuring that each byte arrives authenticated, authorized, rate-limited, persisted, and billed.

The Problem

AI coding agents are not request-response APIs. A single user interaction — “refactor this module to use dependency injection” — may produce dozens of real-time events over several minutes: thinking blocks as the model reasons, tool calls as it reads and writes files, terminal output as it runs tests, permission requests when it encounters sensitive operations, and a structured completion with token usage. All of this must stream to the client with sub-100ms latency for interactive use. Beyond the wire protocol, a production gateway must solve several orthogonal problems simultaneously:

Multi-tenant isolation — each organization’s sessions, history, and billing must be strictly separated, with no possibility of cross-tenant data leakage
Session persistence — users expect to close their laptop, reopen it tomorrow, and resume exactly where they left off, with full conversation history and event replay
Billing enforcement — credit reservations must be checked before an expensive LLM turn begins, not after
Role-based access control — owners, admins, and members have different permissions over sessions, team management, and billing
Connection resilience — clients disconnect, servers restart, agents crash; the system must recover gracefully from all of these without losing data
Horizontal scalability — the gateway must scale to thousands of concurrent sessions without requiring shared state between instances

These are not exotic requirements. They are the table stakes of any production-grade AI platform. The question is whether you solve them with a patchwork of middleware, external services, and operational glue — or whether you solve them once, correctly, in a purpose-built gateway.

The Solution

Diminuendo is a lean, single-binary gateway — approximately 8,000 lines of Effect TS — running on Bun. It uses SQLite in WAL mode for zero-ops persistence: no Postgres cluster, no Redis dependency, no external state store. Each tenant gets its own SQLite database for session metadata; each session gets its own database for conversation history and events. This per-tenant data isolation means horizontal scaling requires nothing more than routing tenants to different gateway instances. The system also includes an automation engine for scheduled runs, heartbeats, and background triage, using the same event pipeline as interactive turns. Migrations are applied automatically on first access. There is no Docker image to pull, no Kubernetes manifest to deploy, no database provisioning step. Clone, install, run.

Position in the Platform

Diminuendo sits between frontend clients and three backend services, each responsible for a distinct domain:

                           +------------------+
                           |     Clients      |
                           | Web | Desktop | CLI
                           +--------+---------+
                                    |
                              WebSocket (wss://)
                                    |
                           +--------v---------+
                           |   Diminuendo     |
                           |   (Gateway)      |
                           |                  |
                           |  Auth | Sessions |
                           |  Billing | RBAC  |
                           +--+-----+------+--+
                              |     |      |
                    +---------+  +--+--+   +----------+
                    |            |     |              |
              +-----v----+ +----v---+ +-----v------+
              |  Podium   | |Ensemble| |  Chronicle |
              | (Agents)  | | (LLMs) | |   (Files)  |
              +----------+ +--------+ +------------+

Service	Responsibility
Podium	Agent orchestration — creates agent instances, manages their lifecycle, routes messages to running agents, streams their output back
Ensemble	LLM inference — model selection, token accounting, provider failover, cost estimation
Chronicle	Workspace filesystem — content-addressed storage, file versioning, bidirectional sync with local filesystems

Diminuendo does not contain any AI logic. It does not call LLM APIs directly. It does not manage files. Its sole concern is being an excellent gateway: authenticating clients, managing session lifecycle, routing messages to the correct backend service, streaming events back to all subscribed clients, persisting everything to durable storage, and enforcing billing and access control.

Protocol at a Glance

The Diminuendo wire protocol is a typed, JSON-over-WebSocket protocol with 21 client message types and 51 server event types.

21 Client Messages

Session CRUD, turn execution, file access, team management, and connection lifecycle

51 Server Events

Streaming text, tool calls, thinking blocks, terminal output, sandbox lifecycle, billing updates, and more

7-State Machine

Each session transitions through: inactive, activating, ready, running, waiting, deactivating, error

Automation System

Scheduled automations, heartbeats, and inbox-driven background runs

SDKs

Four official client SDKs provide typed wrappers around the wire protocol:

TypeScript

Zero-dependency, works in browsers and Node.js/Bun. Promise-based with typed event handlers.

Rust

Tokio-based async client with Stream for events. Designed for Tauri desktop apps and CLI tools.

Python

Asyncio-based client with callback event handling. Suitable for scripting, testing, and notebook integration.

Swift

Actor-based async/await client for macOS and iOS. Codable events with zero third-party dependencies.

Getting Started

Prerequisites

Diminuendo runs on Bun — a JavaScript runtime with native WebSocket and SQLite support. Install it with:

curl -fsSL https://bun.sh/install | bash

Bun 1.0 or later is required.

Clone with Submodules

The repository uses git submodules for its backend service dependencies (Podium, Ensemble, Chronicle). Clone with --recurse-submodules to pull them in a single step:

git clone --recurse-submodules https://github.com/iGentAI/diminuendo.git
cd diminuendo

If you already cloned without --recurse-submodules, initialize them after the fact:

git submodule update --init --recursive

Install Dependencies

bun install

Start the Gateway

bun run --watch src/main.ts

You should see output similar to:

[info] Config: dev mode enabled
[info] Auth: Dev mode enabled — all requests authenticated as developer@example.com
[info] Gateway listening on 0.0.0.0:8080 (dev mode — auth bypassed)

In dev mode, every WebSocket connection is automatically authenticated as developer@example.com in the dev tenant. No Auth0 configuration, no JWT tokens, no environment variables required. The gateway creates a ./data directory on first run containing SQLite databases for tenant registries and session data. This directory is gitignored — delete it at any time to reset all state.

In production, the welcome message will have requiresAuth: true. You must send an authenticate message with a valid JWT before the gateway will accept any other messages. In dev mode, this step is skipped entirely.

Diminuendo

Protocol

Clients

Operations

Introduction

Diminuendo

The Problem

The Solution

Position in the Platform

Protocol at a Glance

21 Client Messages

51 Server Events

7-State Machine

Automation System

SDKs

TypeScript

Rust

Python

Swift

Getting Started

Diminuendo

Protocol

Clients

Operations

​Diminuendo

​The Problem

​The Solution

​Position in the Platform

​Protocol at a Glance

21 Client Messages

51 Server Events

7-State Machine

Automation System

​SDKs

TypeScript

Rust

Python

Swift

​Getting Started

Diminuendo

The Problem

The Solution

Position in the Platform

Protocol at a Glance

SDKs

Getting Started