8.2 KiB
Matrix Multi-Agent Routing Design
Goal
Move the Matrix surface from a single hardcoded upstream agent to a user-selectable multi-agent model, while preserving the existing room-based UX and the current PlatformClient boundary.
The result should be:
- one Matrix bot can work with multiple upstream agents
- users can choose an agent from the full configured list
- each chat is bound to exactly one agent
- switching the selected agent does not silently retarget an existing chat
Core Decision
The selected routing model is:
user.selected_agent_id + room.agent_id + room.platform_chat_id
This means:
- the user has one current selected agent
- each Matrix working room stores the agent it is bound to
- each Matrix working room stores its own
platform_chat_id - a room never changes agent implicitly
Why This Decision
The current Matrix adapter already separates:
- user-facing room organization
- local chat labels such as
C1,C2,C3 - platform-facing conversation identity via
platform_chat_id
Adding multi-agent support should preserve that shape instead of replacing it.
If routing depended only on the current user selection, then an old room could start talking to a different agent after a switch. That would make room history and backend context hard to reason about. Binding an agent to the room keeps the conversation model explicit.
Scope
This design covers:
- agent selection by the user inside the Matrix surface
- durable storage of the selected agent
- durable storage of the room-bound agent
- routing normal messages and context commands to the correct upstream agent
- behavior when a room becomes stale after an agent switch
This design does not cover:
- per-agent workspace isolation
- platform-side agent lifecycle or memory persistence
- per-user allowlists for available agents
- Telegram or other surfaces
Configuration Model
Agent registry
Available agents are defined in a local config file loaded once at bot startup.
Example:
agents:
- id: agent-1
label: Analyst
- id: agent-2
label: Research
- id: agent-3
label: Ops
Rules:
- every entry must have a stable
id - every entry must have a user-visible
label - all configured agents are selectable by all users
- config changes apply only after bot restart
Startup validation
If the agent config is missing, empty, or invalid, the Matrix bot must fail fast on startup with a clear operator error.
Durable State Model
User-level state
User metadata keeps the current selected agent.
Example matrix_user:* shape:
{
"space_id": "!space:example.org",
"next_chat_index": 4,
"selected_agent_id": "agent-2"
}
Meaning:
selected_agent_idcontrols future chat creation and activation of an unbound roomselected_agent_iddoes not rewrite already bound rooms
Room-level state
Room metadata stores the agent bound to that chat.
Example matrix_room:* shape:
{
"room_type": "chat",
"chat_id": "C3",
"display_name": "Чат 3",
"matrix_user_id": "@alice:example.org",
"space_id": "!space:example.org",
"platform_chat_id": "42",
"agent_id": "agent-2"
}
Rules:
- one room binds to exactly one
agent_id - one room binds to exactly one current
platform_chat_id - once a room becomes stale after an agent switch, it never becomes active again
Runtime Semantics
!start
!start remains lightweight:
- if no agent is selected, the bot explains that an agent must be selected before normal messaging
- if an agent is already selected, the bot reports the current selection and reminds the user that
!newcreates a new room under that agent
!agent
Introduce an agent-selection command.
Behavior:
!agentshows the available agent list- agent selection stores
selected_agent_idin user metadata - after a successful switch, the bot tells the user that existing chats bound to another agent are stale and that
!newis required for continued work
The exact UI can be text-first for MVP. A richer UI can be added later without changing the state model.
Normal message without selected agent
If the user has not selected an agent yet:
- do not call the platform
- return the available agent list
- ask the user to choose one first
Selecting an agent inside an unbound chat
If the current room has never been bound to any agent:
- store the new
selected_agent_idfor the user - bind the current room to that same
agent_id - allow the room to become the active working chat immediately
This avoids forcing !new for the user's first usable chat.
!new
!new creates a new working room under the current selected agent.
Behavior:
- require
selected_agent_id - create the new Matrix room
- allocate a new
platform_chat_id - store
agent_id = selected_agent_idin the new room metadata
Normal message in an unbound room with selected agent
If a room exists but has no agent_id yet and the user already has selected_agent_id:
- bind the room to
selected_agent_id - ensure it has
platform_chat_id - continue normal message dispatch
Normal message in a bound room
If the room already has agent_id and it matches the current selected agent:
- route the message to that
agent_id - use the room's
platform_chat_id
Stale room after agent switch
If the room's bound agent_id differs from the user's current selected_agent_id:
- do not call the platform
- treat the room as stale
- return a short message telling the user that this chat belongs to the old agent and that they must use
!new
Returning to a previously selected agent
If the user later selects an old agent again:
- previously stale rooms do not become valid again
- the user must still create a fresh room via
!new
Routing and Component Changes
Agent registry loader
Add a small loader responsible for:
- reading
agents.yaml - validating ids and labels
- exposing a read-only registry to runtime code
The runtime should not parse YAML ad hoc during message handling.
Matrix runtime pre-check
Before dispatching a normal message, the Matrix runtime must resolve:
- whether the user has
selected_agent_id - whether the current room already has
agent_id - whether the room can be bound now
- whether the room is stale
This pre-check happens before handing the message to the existing dispatcher path.
Real platform bridge
The current real backend path hardcodes a single runtime-level agent_id.
That must be replaced with per-request routing.
The selected design is:
- the runtime resolves the target
agent_id - the platform bridge creates a fresh upstream
AgentApifor thatagent_id - no long-lived
AgentApiinstances are cached by user
This preserves the current fresh-connection-per-request behavior.
Error Handling
Missing or invalid selected agent
If selected_agent_id is absent:
- ask the user to select an agent
If selected_agent_id points to an agent that no longer exists in config:
- treat the selection as invalid
- ask the user to select again
Missing room binding
If the room has no agent_id:
- bind it only when the user has a valid current selection
- otherwise return the selection prompt
Stale room
If the room is stale:
- do not attempt fallback routing
- do not silently rewrite room metadata
- instruct the user to run
!new
Invalid config
If the bot cannot load a valid agent registry:
- fail at startup
- do not start in degraded single-agent mode
Testing Expectations
Tests for this design should prove:
- config parsing and startup validation
- selecting an agent persists
selected_agent_id - selecting an agent inside an unbound room activates that room
!newbinds the new room to the selected agent- messages in a bound room use that room's
agent_id - stale rooms reject normal messaging with a clear
!newinstruction - returning to the same agent later does not revive stale rooms
Migration Notes
Existing rooms may have platform_chat_id but no agent_id.
For this MVP, treat those rooms as legacy-unbound rooms:
- if the user has a valid selected agent, the room may be bound on first use
- if no agent is selected, the room prompts for selection first
No automatic migration across agents is introduced.