surfaces/docs/superpowers/specs/2026-04-24-matrix-multi-agent-routing-design.md

8.2 KiB

Matrix Multi-Agent Routing Design

Goal

Move the Matrix surface from a single hardcoded upstream agent to a user-selectable multi-agent model, while preserving the existing room-based UX and the current PlatformClient boundary.

The result should be:

  • one Matrix bot can work with multiple upstream agents
  • users can choose an agent from the full configured list
  • each chat is bound to exactly one agent
  • switching the selected agent does not silently retarget an existing chat

Core Decision

The selected routing model is:

user.selected_agent_id + room.agent_id + room.platform_chat_id

This means:

  • the user has one current selected agent
  • each Matrix working room stores the agent it is bound to
  • each Matrix working room stores its own platform_chat_id
  • a room never changes agent implicitly

Why This Decision

The current Matrix adapter already separates:

  • user-facing room organization
  • local chat labels such as C1, C2, C3
  • platform-facing conversation identity via platform_chat_id

Adding multi-agent support should preserve that shape instead of replacing it.

If routing depended only on the current user selection, then an old room could start talking to a different agent after a switch. That would make room history and backend context hard to reason about. Binding an agent to the room keeps the conversation model explicit.

Scope

This design covers:

  • agent selection by the user inside the Matrix surface
  • durable storage of the selected agent
  • durable storage of the room-bound agent
  • routing normal messages and context commands to the correct upstream agent
  • behavior when a room becomes stale after an agent switch

This design does not cover:

  • per-agent workspace isolation
  • platform-side agent lifecycle or memory persistence
  • per-user allowlists for available agents
  • Telegram or other surfaces

Configuration Model

Agent registry

Available agents are defined in a local config file loaded once at bot startup.

Example:

agents:
  - id: agent-1
    label: Analyst
  - id: agent-2
    label: Research
  - id: agent-3
    label: Ops

Rules:

  • every entry must have a stable id
  • every entry must have a user-visible label
  • all configured agents are selectable by all users
  • config changes apply only after bot restart

Startup validation

If the agent config is missing, empty, or invalid, the Matrix bot must fail fast on startup with a clear operator error.

Durable State Model

User-level state

User metadata keeps the current selected agent.

Example matrix_user:* shape:

{
  "space_id": "!space:example.org",
  "next_chat_index": 4,
  "selected_agent_id": "agent-2"
}

Meaning:

  • selected_agent_id controls future chat creation and activation of an unbound room
  • selected_agent_id does not rewrite already bound rooms

Room-level state

Room metadata stores the agent bound to that chat.

Example matrix_room:* shape:

{
  "room_type": "chat",
  "chat_id": "C3",
  "display_name": "Чат 3",
  "matrix_user_id": "@alice:example.org",
  "space_id": "!space:example.org",
  "platform_chat_id": "42",
  "agent_id": "agent-2"
}

Rules:

  • one room binds to exactly one agent_id
  • one room binds to exactly one current platform_chat_id
  • once a room becomes stale after an agent switch, it never becomes active again

Runtime Semantics

!start

!start remains lightweight:

  • if no agent is selected, the bot explains that an agent must be selected before normal messaging
  • if an agent is already selected, the bot reports the current selection and reminds the user that !new creates a new room under that agent

!agent

Introduce an agent-selection command.

Behavior:

  • !agent shows the available agent list
  • agent selection stores selected_agent_id in user metadata
  • after a successful switch, the bot tells the user that existing chats bound to another agent are stale and that !new is required for continued work

The exact UI can be text-first for MVP. A richer UI can be added later without changing the state model.

Normal message without selected agent

If the user has not selected an agent yet:

  • do not call the platform
  • return the available agent list
  • ask the user to choose one first

Selecting an agent inside an unbound chat

If the current room has never been bound to any agent:

  • store the new selected_agent_id for the user
  • bind the current room to that same agent_id
  • allow the room to become the active working chat immediately

This avoids forcing !new for the user's first usable chat.

!new

!new creates a new working room under the current selected agent.

Behavior:

  1. require selected_agent_id
  2. create the new Matrix room
  3. allocate a new platform_chat_id
  4. store agent_id = selected_agent_id in the new room metadata

Normal message in an unbound room with selected agent

If a room exists but has no agent_id yet and the user already has selected_agent_id:

  • bind the room to selected_agent_id
  • ensure it has platform_chat_id
  • continue normal message dispatch

Normal message in a bound room

If the room already has agent_id and it matches the current selected agent:

  • route the message to that agent_id
  • use the room's platform_chat_id

Stale room after agent switch

If the room's bound agent_id differs from the user's current selected_agent_id:

  • do not call the platform
  • treat the room as stale
  • return a short message telling the user that this chat belongs to the old agent and that they must use !new

Returning to a previously selected agent

If the user later selects an old agent again:

  • previously stale rooms do not become valid again
  • the user must still create a fresh room via !new

Routing and Component Changes

Agent registry loader

Add a small loader responsible for:

  • reading agents.yaml
  • validating ids and labels
  • exposing a read-only registry to runtime code

The runtime should not parse YAML ad hoc during message handling.

Matrix runtime pre-check

Before dispatching a normal message, the Matrix runtime must resolve:

  • whether the user has selected_agent_id
  • whether the current room already has agent_id
  • whether the room can be bound now
  • whether the room is stale

This pre-check happens before handing the message to the existing dispatcher path.

Real platform bridge

The current real backend path hardcodes a single runtime-level agent_id. That must be replaced with per-request routing.

The selected design is:

  • the runtime resolves the target agent_id
  • the platform bridge creates a fresh upstream AgentApi for that agent_id
  • no long-lived AgentApi instances are cached by user

This preserves the current fresh-connection-per-request behavior.

Error Handling

Missing or invalid selected agent

If selected_agent_id is absent:

  • ask the user to select an agent

If selected_agent_id points to an agent that no longer exists in config:

  • treat the selection as invalid
  • ask the user to select again

Missing room binding

If the room has no agent_id:

  • bind it only when the user has a valid current selection
  • otherwise return the selection prompt

Stale room

If the room is stale:

  • do not attempt fallback routing
  • do not silently rewrite room metadata
  • instruct the user to run !new

Invalid config

If the bot cannot load a valid agent registry:

  • fail at startup
  • do not start in degraded single-agent mode

Testing Expectations

Tests for this design should prove:

  • config parsing and startup validation
  • selecting an agent persists selected_agent_id
  • selecting an agent inside an unbound room activates that room
  • !new binds the new room to the selected agent
  • messages in a bound room use that room's agent_id
  • stale rooms reject normal messaging with a clear !new instruction
  • returning to the same agent later does not revive stale rooms

Migration Notes

Existing rooms may have platform_chat_id but no agent_id.

For this MVP, treat those rooms as legacy-unbound rooms:

  • if the user has a valid selected agent, the room may be bound on first use
  • if no agent is selected, the room prompts for selection first

No automatic migration across agents is introduced.