10 KiB
Matrix Multi-Agent Routing Design
Goal
Move the Matrix surface from a single hardcoded upstream agent to a user-selectable multi-agent model, while preserving the existing room-based UX and the current PlatformClient boundary.
The result should be:
- one Matrix bot can work with multiple upstream agents
- users can choose an agent from the full configured list
- each chat is bound to exactly one agent
- switching the selected agent does not silently retarget an existing chat
Core Decision
The selected routing model is:
user.selected_agent_id + room.agent_id + room.platform_chat_id
This means:
- the user has one current selected agent
- each Matrix working room stores the agent it is bound to
- each Matrix working room stores its own
platform_chat_id - a room never changes agent implicitly
- the shared
PlatformClientprotocol remains unchanged - Matrix multi-agent routing is implemented by a single routing facade that delegates to per-agent real clients
Why This Decision
The current Matrix adapter already separates:
- user-facing room organization
- local chat labels such as
C1,C2,C3 - platform-facing conversation identity via
platform_chat_id
Adding multi-agent support should preserve that shape instead of replacing it.
If routing depended only on the current user selection, then an old room could start talking to a different agent after a switch. That would make room history and backend context hard to reason about. Binding an agent to the room keeps the conversation model explicit.
Scope
This design covers:
- agent selection by the user inside the Matrix surface
- durable storage of the selected agent
- durable storage of the room-bound agent
- routing normal messages and context commands to the correct upstream agent
- behavior when a room becomes stale after an agent switch
This design does not cover:
- per-agent workspace isolation
- platform-side agent lifecycle or memory persistence
- per-user allowlists for available agents
- Telegram or other surfaces
Configuration Model
Agent registry
Available agents are defined in a local config file loaded once at bot startup.
Example:
agents:
- id: agent-1
label: Analyst
- id: agent-2
label: Research
- id: agent-3
label: Ops
Rules:
- every entry must have a stable
id - every entry must have a user-visible
label - all configured agents are selectable by all users
- config changes apply only after bot restart
Startup validation
If the agent config is missing, empty, or invalid, the Matrix bot must fail fast on startup with a clear operator error.
Durable State Model
User-level state
User metadata keeps the current selected agent.
Example matrix_user:* shape:
{
"space_id": "!space:example.org",
"next_chat_index": 4,
"selected_agent_id": "agent-2"
}
Meaning:
selected_agent_idcontrols future chat creation and activation of an unbound roomselected_agent_iddoes not rewrite already bound rooms
Room-level state
Room metadata stores the agent bound to that chat.
Example matrix_room:* shape:
{
"room_type": "chat",
"chat_id": "C3",
"display_name": "Чат 3",
"matrix_user_id": "@alice:example.org",
"space_id": "!space:example.org",
"platform_chat_id": "42",
"agent_id": "agent-2"
}
Rules:
- one room binds to exactly one
agent_id - one room binds to exactly one current
platform_chat_id - once a room becomes stale after an agent switch, it never becomes active again
Runtime Semantics
!start
!start remains lightweight:
- if no agent is selected, the bot explains that an agent must be selected before normal messaging
- if an agent is already selected, the bot reports the current selection and reminds the user that
!newcreates a new room under that agent
!agent
Introduce an agent-selection command.
Behavior:
!agentshows the available agent list- agent selection stores
selected_agent_idin user metadata - after a successful switch, the bot tells the user that existing chats bound to another agent are stale and that
!newis required for continued work
The exact UI can be text-first for MVP. A richer UI can be added later without changing the state model.
Normal message without selected agent
If the user has not selected an agent yet:
- do not call the platform
- return the available agent list
- ask the user to choose one first
This is an intentional one-time routing handshake, not an accidental fallback. In a multi-agent deployment, the surface must not silently guess which agent an unbound user should talk to.
Selecting an agent inside an unbound chat
If the current room has never been bound to any agent:
- store the new
selected_agent_idfor the user - bind the current room to that same
agent_id - allow the room to become the active working chat immediately
This avoids forcing !new for the user's first usable chat.
!new
!new creates a new working room under the current selected agent.
Behavior:
- require
selected_agent_id - create the new Matrix room
- allocate a new
platform_chat_id - store
agent_id = selected_agent_idin the new room metadata
Normal message in an unbound room with selected agent
If a room exists but has no agent_id yet and the user already has selected_agent_id:
- bind the room to
selected_agent_id - ensure it has
platform_chat_id - continue normal message dispatch
Normal message in a bound room
If the room already has agent_id and it matches the current selected agent:
- route the message to that
agent_id - use the room's
platform_chat_id
Stale room after agent switch
If the room's bound agent_id differs from the user's current selected_agent_id:
- do not call the platform
- treat the room as stale
- return a short message telling the user that this chat belongs to the old agent and that they must use
!new
Returning to a previously selected agent
If the user later selects an old agent again:
- previously stale rooms do not become valid again
- the user must still create a fresh room via
!new
Routing and Component Changes
Agent registry loader
Add a small loader responsible for:
- reading
agents.yaml - validating ids and labels
- exposing a read-only registry to runtime code
The runtime should not parse YAML ad hoc during message handling.
Matrix runtime pre-check
Before dispatching a normal message, the Matrix runtime must resolve:
- whether the user has
selected_agent_id - whether the current room already has
agent_id - whether the room can be bound now
- whether the room is stale
This pre-check happens before handing the message to the existing dispatcher path.
Routed platform client
The selected implementation keeps the shared PlatformClient protocol unchanged.
The Matrix runtime owns one routing-aware facade, for example RoutedPlatformClient, that implements PlatformClient and delegates to agent-specific real clients.
Responsibilities:
- resolve the current room binding from local Matrix metadata
- translate a local Matrix logical chat id into the room's
platform_chat_id - choose the correct per-agent delegate for the room's bound
agent_id - keep
get_or_create_user,get_settings, andupdate_settingsbehavior stable for the rest of the runtime
This keeps the multi-agent logic inside the Matrix integration boundary instead of pushing agent selection into the shared protocol.
Real platform bridge delegates
The current real backend path hardcodes a single runtime-level agent_id.
That must be replaced with per-agent delegates hidden behind the routing facade.
The selected design is:
RealPlatformClientremains the low-level direct-agent delegate for one configuredagent_id- the routing facade holds or creates one
RealPlatformClientdelegate per configured agent send_message(...)andstream_message(...)on the facade resolve the room target and forward the call to the matching delegate- the delegate creates a fresh upstream
AgentApifor its configuredagent_id - no long-lived
AgentApiinstances are cached by user
This preserves the current fresh-connection-per-request behavior while avoiding a protocol break for Telegram or other surfaces.
Error Handling
Missing or invalid selected agent
If selected_agent_id is absent:
- ask the user to select an agent
If selected_agent_id points to an agent that no longer exists in config:
- treat the selection as invalid
- ask the user to select again
Missing room binding
If the room has no agent_id:
- bind it only when the user has a valid current selection
- otherwise return the selection prompt
Stale room
If the room is stale:
- do not attempt fallback routing
- do not silently rewrite room metadata
- instruct the user to run
!new
Invalid config
If the bot cannot load a valid agent registry:
- fail at startup
- do not start in degraded single-agent mode
Testing Expectations
Tests for this design should prove:
- config parsing and startup validation
- selecting an agent persists
selected_agent_id - selecting an agent inside an unbound room activates that room
!newbinds the new room to the selected agent- messages in a bound room use that room's
agent_id - stale rooms reject normal messaging with a clear
!newinstruction - returning to the same agent later does not revive stale rooms
Migration Notes
Existing rooms may have platform_chat_id but no agent_id.
For this MVP, treat those rooms as legacy-unbound rooms:
- if the user has a valid selected agent, the room may be bound on first use
- if no agent is selected, the room prompts for selection first
No automatic migration across agents is introduced.
Existing users without selected_agent_id
Existing users upgraded from the single-agent model may have working rooms but no stored selected_agent_id.
For this MVP, that is handled explicitly:
- normal messaging is paused until the user selects an agent
- the first valid selection can bind an unbound room immediately
- the surface does not auto-assign a default agent in a multi-agent config
This is intentional. Once more than one agent exists, silent migration would be ambiguous and could route a user to the wrong backend target.