docs: clarify matrix multi-agent routing specs

This commit is contained in:
Mikhail Putilovskij 2026-04-24 12:42:58 +03:00
parent 842117900a
commit 32b03becc8
2 changed files with 53 additions and 5 deletions

View file

@ -23,6 +23,8 @@ This means:
- each Matrix working room stores the agent it is bound to
- each Matrix working room stores its own `platform_chat_id`
- a room never changes agent implicitly
- the shared `PlatformClient` protocol remains unchanged
- Matrix multi-agent routing is implemented by a single routing facade that delegates to per-agent real clients
## Why This Decision
@ -156,6 +158,9 @@ If the user has not selected an agent yet:
- return the available agent list
- ask the user to choose one first
This is an intentional one-time routing handshake, not an accidental fallback.
In a multi-agent deployment, the surface must not silently guess which agent an unbound user should talk to.
### Selecting an agent inside an unbound chat
If the current room has never been bound to any agent:
@ -230,18 +235,35 @@ Before dispatching a normal message, the Matrix runtime must resolve:
This pre-check happens before handing the message to the existing dispatcher path.
### Real platform bridge
### Routed platform client
The selected implementation keeps the shared `PlatformClient` protocol unchanged.
The Matrix runtime owns one routing-aware facade, for example `RoutedPlatformClient`, that implements `PlatformClient` and delegates to agent-specific real clients.
Responsibilities:
- resolve the current room binding from local Matrix metadata
- translate a local Matrix logical chat id into the room's `platform_chat_id`
- choose the correct per-agent delegate for the room's bound `agent_id`
- keep `get_or_create_user`, `get_settings`, and `update_settings` behavior stable for the rest of the runtime
This keeps the multi-agent logic inside the Matrix integration boundary instead of pushing agent selection into the shared protocol.
### Real platform bridge delegates
The current real backend path hardcodes a single runtime-level `agent_id`.
That must be replaced with per-request routing.
That must be replaced with per-agent delegates hidden behind the routing facade.
The selected design is:
- the runtime resolves the target `agent_id`
- the platform bridge creates a fresh upstream `AgentApi` for that `agent_id`
- `RealPlatformClient` remains the low-level direct-agent delegate for one configured `agent_id`
- the routing facade holds or creates one `RealPlatformClient` delegate per configured agent
- `send_message(...)` and `stream_message(...)` on the facade resolve the room target and forward the call to the matching delegate
- the delegate creates a fresh upstream `AgentApi` for its configured `agent_id`
- no long-lived `AgentApi` instances are cached by user
This preserves the current fresh-connection-per-request behavior.
This preserves the current fresh-connection-per-request behavior while avoiding a protocol break for Telegram or other surfaces.
## Error Handling
@ -300,3 +322,15 @@ For this MVP, treat those rooms as legacy-unbound rooms:
- if no agent is selected, the room prompts for selection first
No automatic migration across agents is introduced.
### Existing users without `selected_agent_id`
Existing users upgraded from the single-agent model may have working rooms but no stored `selected_agent_id`.
For this MVP, that is handled explicitly:
- normal messaging is paused until the user selects an agent
- the first valid selection can bind an unbound room immediately
- the surface does not auto-assign a default agent in a multi-agent config
This is intentional. Once more than one agent exists, silent migration would be ambiguous and could route a user to the wrong backend target.

View file

@ -64,6 +64,7 @@ The Matrix surface must persist:
- `matrix_user:*`
- `matrix_room:*`
- `chat:*`
- `PLATFORM_CHAT_SEQ_KEY`
- `selected_agent_id`
- room-bound `agent_id`
- room-bound `platform_chat_id`
@ -74,6 +75,7 @@ This is the minimal state required so that, after restart, the surface can:
- identify the room
- determine which agent should receive a message
- determine which `platform_chat_id` should be used
- continue allocating new `platform_chat_id` values without reusing an already issued sequence number
### Non-durable state
@ -155,6 +157,17 @@ Example:
}
```
### Platform chat sequence
The global `PLATFORM_CHAT_SEQ_KEY` remains part of durable surface state.
Its purpose is:
- allocate monotonically increasing `platform_chat_id` values
- avoid reusing a previously issued platform chat identifier during normal restart or redeploy
This sequence must be stored in the same durable surface store as the room and user metadata.
## Runtime Semantics After Restart
After restart, the Matrix surface must:
@ -186,6 +199,7 @@ The multi-agent design introduces new durable state that must survive restart:
- `selected_agent_id` on the user
- `agent_id` on the room
- `PLATFORM_CHAT_SEQ_KEY` in the surface store
Restart persistence and multi-agent routing therefore belong together.