# Matrix Multi-Agent Routing Design ## Goal Move the Matrix surface from a single hardcoded upstream agent to a user-selectable multi-agent model, while preserving the existing room-based UX and the current `PlatformClient` boundary. The result should be: - one Matrix bot can work with multiple upstream agents - users can choose an agent from the full configured list - each chat is bound to exactly one agent - switching the selected agent does not silently retarget an existing chat ## Core Decision The selected routing model is: `user.selected_agent_id + room.agent_id + room.platform_chat_id` This means: - the user has one current selected agent - each Matrix working room stores the agent it is bound to - each Matrix working room stores its own `platform_chat_id` - a room never changes agent implicitly - the shared `PlatformClient` protocol remains unchanged - Matrix multi-agent routing is implemented by a single routing facade that delegates to per-agent real clients ## Why This Decision The current Matrix adapter already separates: - user-facing room organization - local chat labels such as `C1`, `C2`, `C3` - platform-facing conversation identity via `platform_chat_id` Adding multi-agent support should preserve that shape instead of replacing it. If routing depended only on the current user selection, then an old room could start talking to a different agent after a switch. That would make room history and backend context hard to reason about. Binding an agent to the room keeps the conversation model explicit. ## Scope This design covers: - agent selection by the user inside the Matrix surface - durable storage of the selected agent - durable storage of the room-bound agent - routing normal messages and context commands to the correct upstream agent - behavior when a room becomes stale after an agent switch This design does not cover: - per-agent workspace isolation - platform-side agent lifecycle or memory persistence - per-user allowlists for available agents - Telegram or other surfaces ## Configuration Model ### Agent registry Available agents are defined in a local config file loaded once at bot startup. Example: ```yaml agents: - id: agent-1 label: Analyst - id: agent-2 label: Research - id: agent-3 label: Ops ``` Rules: - every entry must have a stable `id` - every entry must have a user-visible `label` - all configured agents are selectable by all users - config changes apply only after bot restart ### Startup validation If the agent config is missing, empty, or invalid, the Matrix bot must fail fast on startup with a clear operator error. ## Durable State Model ### User-level state User metadata keeps the current selected agent. Example `matrix_user:*` shape: ```json { "space_id": "!space:example.org", "next_chat_index": 4, "selected_agent_id": "agent-2" } ``` Meaning: - `selected_agent_id` controls future chat creation and activation of an unbound room - `selected_agent_id` does not rewrite already bound rooms ### Room-level state Room metadata stores the agent bound to that chat. Example `matrix_room:*` shape: ```json { "room_type": "chat", "chat_id": "C3", "display_name": "Чат 3", "matrix_user_id": "@alice:example.org", "space_id": "!space:example.org", "platform_chat_id": "42", "agent_id": "agent-2" } ``` Rules: - one room binds to exactly one `agent_id` - one room binds to exactly one current `platform_chat_id` - once a room becomes stale after an agent switch, it never becomes active again ## Runtime Semantics ### `!start` `!start` remains lightweight: - if no agent is selected, the bot explains that an agent must be selected before normal messaging - if an agent is already selected, the bot reports the current selection and reminds the user that `!new` creates a new room under that agent ### `!agent` Introduce an agent-selection command. Behavior: - `!agent` shows the available agent list - agent selection stores `selected_agent_id` in user metadata - after a successful switch, the bot tells the user that existing chats bound to another agent are stale and that `!new` is required for continued work The exact UI can be text-first for MVP. A richer UI can be added later without changing the state model. ### Normal message without selected agent If the user has not selected an agent yet: - do not call the platform - return the available agent list - ask the user to choose one first This is an intentional one-time routing handshake, not an accidental fallback. In a multi-agent deployment, the surface must not silently guess which agent an unbound user should talk to. ### Selecting an agent inside an unbound chat If the current room has never been bound to any agent: - store the new `selected_agent_id` for the user - bind the current room to that same `agent_id` - allow the room to become the active working chat immediately This avoids forcing `!new` for the user's first usable chat. ### `!new` `!new` creates a new working room under the current selected agent. Behavior: 1. require `selected_agent_id` 2. create the new Matrix room 3. allocate a new `platform_chat_id` 4. store `agent_id = selected_agent_id` in the new room metadata ### Normal message in an unbound room with selected agent If a room exists but has no `agent_id` yet and the user already has `selected_agent_id`: - bind the room to `selected_agent_id` - ensure it has `platform_chat_id` - continue normal message dispatch ### Normal message in a bound room If the room already has `agent_id` and it matches the current selected agent: - route the message to that `agent_id` - use the room's `platform_chat_id` ### Stale room after agent switch If the room's bound `agent_id` differs from the user's current `selected_agent_id`: - do not call the platform - treat the room as stale - return a short message telling the user that this chat belongs to the old agent and that they must use `!new` ### Returning to a previously selected agent If the user later selects an old agent again: - previously stale rooms do not become valid again - the user must still create a fresh room via `!new` ## Routing and Component Changes ### Agent registry loader Add a small loader responsible for: - reading `agents.yaml` - validating ids and labels - exposing a read-only registry to runtime code The runtime should not parse YAML ad hoc during message handling. ### Matrix runtime pre-check Before dispatching a normal message, the Matrix runtime must resolve: - whether the user has `selected_agent_id` - whether the current room already has `agent_id` - whether the room can be bound now - whether the room is stale This pre-check happens before handing the message to the existing dispatcher path. ### Routed platform client The selected implementation keeps the shared `PlatformClient` protocol unchanged. The Matrix runtime owns one routing-aware facade, for example `RoutedPlatformClient`, that implements `PlatformClient` and delegates to agent-specific real clients. Responsibilities: - resolve the current room binding from local Matrix metadata - translate a local Matrix logical chat id into the room's `platform_chat_id` - choose the correct per-agent delegate for the room's bound `agent_id` - keep `get_or_create_user`, `get_settings`, and `update_settings` behavior stable for the rest of the runtime This keeps the multi-agent logic inside the Matrix integration boundary instead of pushing agent selection into the shared protocol. ### Real platform bridge delegates The current real backend path hardcodes a single runtime-level `agent_id`. That must be replaced with per-agent delegates hidden behind the routing facade. The selected design is: - `RealPlatformClient` remains the low-level direct-agent delegate for one configured `agent_id` - the routing facade holds or creates one `RealPlatformClient` delegate per configured agent - `send_message(...)` and `stream_message(...)` on the facade resolve the room target and forward the call to the matching delegate - the delegate creates a fresh upstream `AgentApi` for its configured `agent_id` - no long-lived `AgentApi` instances are cached by user This preserves the current fresh-connection-per-request behavior while avoiding a protocol break for Telegram or other surfaces. ## Error Handling ### Missing or invalid selected agent If `selected_agent_id` is absent: - ask the user to select an agent If `selected_agent_id` points to an agent that no longer exists in config: - treat the selection as invalid - ask the user to select again ### Missing room binding If the room has no `agent_id`: - bind it only when the user has a valid current selection - otherwise return the selection prompt ### Stale room If the room is stale: - do not attempt fallback routing - do not silently rewrite room metadata - instruct the user to run `!new` ### Invalid config If the bot cannot load a valid agent registry: - fail at startup - do not start in degraded single-agent mode ## Testing Expectations Tests for this design should prove: - config parsing and startup validation - selecting an agent persists `selected_agent_id` - selecting an agent inside an unbound room activates that room - `!new` binds the new room to the selected agent - messages in a bound room use that room's `agent_id` - stale rooms reject normal messaging with a clear `!new` instruction - returning to the same agent later does not revive stale rooms ## Migration Notes Existing rooms may have `platform_chat_id` but no `agent_id`. For this MVP, treat those rooms as legacy-unbound rooms: - if the user has a valid selected agent, the room may be bound on first use - if no agent is selected, the room prompts for selection first No automatic migration across agents is introduced. ### Existing users without `selected_agent_id` Existing users upgraded from the single-agent model may have working rooms but no stored `selected_agent_id`. For this MVP, that is handled explicitly: - normal messaging is paused until the user selects an agent - the first valid selection can bind an unbound room immediately - the surface does not auto-assign a default agent in a multi-agent config This is intentional. Once more than one agent exists, silent migration would be ambiguous and could route a user to the wrong backend target.