Mikhail Putilovskij 32b03becc8 docs: clarify matrix multi-agent routing specs

2026-04-24 12:42:58 +03:00

10 KiB

Raw Permalink Blame History

Matrix Multi-Agent Routing Design

Goal

Move the Matrix surface from a single hardcoded upstream agent to a user-selectable multi-agent model, while preserving the existing room-based UX and the current PlatformClient boundary.

The result should be:

one Matrix bot can work with multiple upstream agents
users can choose an agent from the full configured list
each chat is bound to exactly one agent
switching the selected agent does not silently retarget an existing chat

Core Decision

The selected routing model is:

user.selected_agent_id + room.agent_id + room.platform_chat_id

This means:

the user has one current selected agent
each Matrix working room stores the agent it is bound to
each Matrix working room stores its own platform_chat_id
a room never changes agent implicitly
the shared PlatformClient protocol remains unchanged
Matrix multi-agent routing is implemented by a single routing facade that delegates to per-agent real clients

Why This Decision

The current Matrix adapter already separates:

user-facing room organization
local chat labels such as C1, C2, C3
platform-facing conversation identity via platform_chat_id

Adding multi-agent support should preserve that shape instead of replacing it.

If routing depended only on the current user selection, then an old room could start talking to a different agent after a switch. That would make room history and backend context hard to reason about. Binding an agent to the room keeps the conversation model explicit.

Scope

This design covers:

agent selection by the user inside the Matrix surface
durable storage of the selected agent
durable storage of the room-bound agent
routing normal messages and context commands to the correct upstream agent
behavior when a room becomes stale after an agent switch

This design does not cover:

per-agent workspace isolation
platform-side agent lifecycle or memory persistence
per-user allowlists for available agents
Telegram or other surfaces

Configuration Model

Agent registry

Available agents are defined in a local config file loaded once at bot startup.

Example:

agents:
  - id: agent-1
    label: Analyst
  - id: agent-2
    label: Research
  - id: agent-3
    label: Ops

Rules:

every entry must have a stable id
every entry must have a user-visible label
all configured agents are selectable by all users
config changes apply only after bot restart

Startup validation

If the agent config is missing, empty, or invalid, the Matrix bot must fail fast on startup with a clear operator error.

Durable State Model

User-level state

User metadata keeps the current selected agent.

Example matrix_user:* shape:

{
  "space_id": "!space:example.org",
  "next_chat_index": 4,
  "selected_agent_id": "agent-2"
}

Meaning:

selected_agent_id controls future chat creation and activation of an unbound room
selected_agent_id does not rewrite already bound rooms

Room-level state

Room metadata stores the agent bound to that chat.

Example matrix_room:* shape:

{
  "room_type": "chat",
  "chat_id": "C3",
  "display_name": "Чат 3",
  "matrix_user_id": "@alice:example.org",
  "space_id": "!space:example.org",
  "platform_chat_id": "42",
  "agent_id": "agent-2"
}

Rules:

one room binds to exactly one agent_id
one room binds to exactly one current platform_chat_id
once a room becomes stale after an agent switch, it never becomes active again

Runtime Semantics

`!start`

!start remains lightweight:

if no agent is selected, the bot explains that an agent must be selected before normal messaging
if an agent is already selected, the bot reports the current selection and reminds the user that !new creates a new room under that agent

`!agent`

Introduce an agent-selection command.

Behavior:

!agent shows the available agent list
agent selection stores selected_agent_id in user metadata
after a successful switch, the bot tells the user that existing chats bound to another agent are stale and that !new is required for continued work

The exact UI can be text-first for MVP. A richer UI can be added later without changing the state model.

Normal message without selected agent

If the user has not selected an agent yet:

do not call the platform
return the available agent list
ask the user to choose one first

This is an intentional one-time routing handshake, not an accidental fallback. In a multi-agent deployment, the surface must not silently guess which agent an unbound user should talk to.

Selecting an agent inside an unbound chat

If the current room has never been bound to any agent:

store the new selected_agent_id for the user
bind the current room to that same agent_id
allow the room to become the active working chat immediately

This avoids forcing !new for the user's first usable chat.

`!new`

!new creates a new working room under the current selected agent.

Behavior:

require selected_agent_id
create the new Matrix room
allocate a new platform_chat_id
store agent_id = selected_agent_id in the new room metadata

Normal message in an unbound room with selected agent

If a room exists but has no agent_id yet and the user already has selected_agent_id:

bind the room to selected_agent_id
ensure it has platform_chat_id
continue normal message dispatch

Normal message in a bound room

If the room already has agent_id and it matches the current selected agent:

route the message to that agent_id
use the room's platform_chat_id

Stale room after agent switch

If the room's bound agent_id differs from the user's current selected_agent_id:

do not call the platform
treat the room as stale
return a short message telling the user that this chat belongs to the old agent and that they must use !new

Returning to a previously selected agent

If the user later selects an old agent again:

previously stale rooms do not become valid again
the user must still create a fresh room via !new

Routing and Component Changes

Agent registry loader

Add a small loader responsible for:

reading agents.yaml
validating ids and labels
exposing a read-only registry to runtime code

The runtime should not parse YAML ad hoc during message handling.

Matrix runtime pre-check

Before dispatching a normal message, the Matrix runtime must resolve:

whether the user has selected_agent_id
whether the current room already has agent_id
whether the room can be bound now
whether the room is stale

This pre-check happens before handing the message to the existing dispatcher path.

Routed platform client

The selected implementation keeps the shared PlatformClient protocol unchanged.

The Matrix runtime owns one routing-aware facade, for example RoutedPlatformClient, that implements PlatformClient and delegates to agent-specific real clients.

Responsibilities:

resolve the current room binding from local Matrix metadata
translate a local Matrix logical chat id into the room's platform_chat_id
choose the correct per-agent delegate for the room's bound agent_id
keep get_or_create_user, get_settings, and update_settings behavior stable for the rest of the runtime

This keeps the multi-agent logic inside the Matrix integration boundary instead of pushing agent selection into the shared protocol.

Real platform bridge delegates

The current real backend path hardcodes a single runtime-level agent_id. That must be replaced with per-agent delegates hidden behind the routing facade.

The selected design is:

RealPlatformClient remains the low-level direct-agent delegate for one configured agent_id
the routing facade holds or creates one RealPlatformClient delegate per configured agent
send_message(...) and stream_message(...) on the facade resolve the room target and forward the call to the matching delegate
the delegate creates a fresh upstream AgentApi for its configured agent_id
no long-lived AgentApi instances are cached by user

This preserves the current fresh-connection-per-request behavior while avoiding a protocol break for Telegram or other surfaces.

Error Handling

Missing or invalid selected agent

If selected_agent_id is absent:

ask the user to select an agent

If selected_agent_id points to an agent that no longer exists in config:

treat the selection as invalid
ask the user to select again

Missing room binding

If the room has no agent_id:

bind it only when the user has a valid current selection
otherwise return the selection prompt

Stale room

If the room is stale:

do not attempt fallback routing
do not silently rewrite room metadata
instruct the user to run !new

Invalid config

If the bot cannot load a valid agent registry:

fail at startup
do not start in degraded single-agent mode

Testing Expectations

Tests for this design should prove:

config parsing and startup validation
selecting an agent persists selected_agent_id
selecting an agent inside an unbound room activates that room
!new binds the new room to the selected agent
messages in a bound room use that room's agent_id
stale rooms reject normal messaging with a clear !new instruction
returning to the same agent later does not revive stale rooms

Migration Notes

Existing rooms may have platform_chat_id but no agent_id.

For this MVP, treat those rooms as legacy-unbound rooms:

if the user has a valid selected agent, the room may be bound on first use
if no agent is selected, the room prompts for selection first

No automatic migration across agents is introduced.

Existing users without `selected_agent_id`

Existing users upgraded from the single-agent model may have working rooms but no stored selected_agent_id.

For this MVP, that is handled explicitly:

normal messaging is paused until the user selects an agent
the first valid selection can bind an unbound room immediately
the surface does not auto-assign a default agent in a multi-agent config

This is intentional. Once more than one agent exists, silent migration would be ambiguous and could route a user to the wrong backend target.

10 KiB Raw Permalink Blame History