surfaces/.planning/phases/05-mvp-deployment/05-RESEARCH.md

670 lines
39 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 05: MVP Deployment — Research
**Researched:** 2026-04-27
**Domain:** Matrix bot deployment — config refactor, DM-first onboarding, file transfer, docker-compose prod topology
**Confidence:** HIGH (all findings verified against actual codebase)
---
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
**Single-chat architecture**
- D-01: chat_id=0 for all messages. One agent context per user. `!clear` resets context.
- D-02: Delete all multi-room infrastructure: C1/C2/C3, `!new`, `!archive`, `!rename`, Space-creation, room provisioning. Matrix bot operates only in DM room.
- D-03: Delete `!save` and `!load` — unreliable without persistent memory in agent.
**Onboarding (DM-first)**
- D-04: On DM invite — accept, send welcome: "Привет! Я Lambda AI-агент. Просто напиши — и я отвечу. `!clear` чтобы начать новый разговор, `!context` чтобы посмотреть статус."
- D-05: No Space, no child rooms. All conversation in one DM room.
**!clear (new command)**
- D-06: Reset agent context — close current AgentApi connection and create new (`await agent.close()` + `await agent.connect()`). Confirm: "Контекст сброшен. Начнём с чистого листа."
- D-11: No confirmation dialog — immediate reset.
**!agent command**
- D-07: Delete completely. user→agent mapping is static from config.
**Agent config (config/matrix-agents.yaml)**
- D-02 (config): Extend current matrix-agents.yaml — add user_agents dict and base_url/workspace_path fields per agent.
- D-03 (schema): AgentDefinition gains `base_url: str` and `workspace_path: str`. AgentRegistry adds `user_agents: dict[matrix_user_id, agent_id]` and `get_agent_id_by_user(matrix_user_id)`.
**Routing user → agent in _build_platform_from_env**
- D-04 (routing): Per-agent URL from config instead of global AGENT_BASE_URL. `_build_platform_from_env` builds delegates with correct base_url per agent. `RoutedPlatformClient._resolve_delegate` uses user_agents from registry.
**Incoming files (user → agent)**
- D-05 (files): Path inside agent workspace: `incoming/{filename}`. Absolute: `{workspace_path}/incoming/{filename}`. Update `files.py`: `build_workspace_attachment_path` takes agent workspace_path and builds `incoming/{filename}`. Pass to `agent.send_message()` as `attachments=["incoming/{filename}"]` (relative to /workspace).
- D-06 (files): workspace_path is taken from AgentDefinition by user's agent_id.
**Outgoing files (agent → user)**
- D-07 (files): On `MsgEventSendFile(path="output/report.pdf")` — read from `{workspace_path}/{path}`. Send as Matrix file message.
**docker-compose for prod**
- D-08: `docker-compose.prod.yml` includes: Matrix bot + agent container (placeholder image `lambda-agent:latest`) + named volume `agents`.
- D-09: Named volume `agents` mounted in Matrix bot as `/agents/` and in agent container as `/workspace`. Env vars from `.env.prod`. Start: `docker compose -f docker-compose.prod.yml up`.
**Unauthorized users**
- D-10: If Matrix user_id not in `user_agents` — accept invite, reply: "К вашему аккаунту не привязан агент. Напишите @og_mput в Telegram для получения доступа." Ignore further messages (or repeat message).
**!settings and other settings commands**
- D-12: Delete `!settings`, `!settings soul`, `!settings skills`, `!settings safety`.
### Claude's Discretion
- MATRIX_AGENT_REGISTRY_PATH — keep as env var for config path (already exists)
- Format of .env.prod
- Group room invites (non-DM) — reject automatically
- Existing Space+rooms for old users — ignore, do not migrate
### Deferred Ideas (OUT OF SCOPE)
- platform-master integration (dynamic `get_agent_url` via POST /api/v1/create) — when feat/storage is ready
- !agent as admin-override — not needed for MVP
- Per-chat context isolation via different chat_id (currently chat_id=0) — waiting for platform signal
</user_constraints>
---
## Summary
Phase 05 is a code-and-config refactor of the existing Matrix adapter. There is no new framework to learn — the full stack (matrix-nio, AgentApi, docker-compose) is already in use. The work is: (1) simplify the data model from multi-room to single DM room per user, (2) extend AgentRegistry with per-user routing and per-agent URLs/paths, (3) reroute file I/O to the shared `/agents/` volume, (4) write a prod docker-compose, and (5) delete substantial legacy code (Space provisioning, C1/C2/C3, !agent, !save, !load, !settings).
The current codebase has 35 failing tests (pre-existing on `feat/deploy`), mostly in `test_dispatcher.py`, `test_invite_space.py`, `test_routed_platform.py` — all testing behaviors that Phase 05 will delete or replace. New tests must cover the simplified DM-first invite flow, the user_agents lookup path, and the new file path logic. Existing passing tests (203) must stay green.
**Primary recommendation:** Execute as three sequential mini-plans: (A) config/registry extension + routing, (B) DM-first onboarding + !clear + legacy deletion, (C) file transfer + docker-compose.prod.yml + .env.prod.
---
## Standard Stack
All libraries are already installed and in use. No new dependencies.
### Core (already in pyproject.toml)
| Library | Version | Purpose | Source |
|---------|---------|---------|--------|
| matrix-nio | installed | Matrix client — join rooms, send messages, upload files | [VERIFIED: adapter/matrix/bot.py imports] |
| pyyaml | installed | YAML config parsing in AgentRegistry | [VERIFIED: agent_registry.py line 7] |
| aiohttp | installed | WebSocket transport inside AgentApi | [VERIFIED: external/platform-agent_api/lambda_agent_api/agent_api.py] |
| structlog | installed | Structured logging | [VERIFIED: bot.py imports] |
| python-dotenv | installed | .env loading | [VERIFIED: bot.py line 79] |
### AgentApi (external, local path)
`external/platform-agent_api/lambda_agent_api/agent_api.py` — imported via `sdk/upstream_agent_api.py` which patches `sys.path`.
**Verified constructor signature** [VERIFIED: agent_api.py]:
```python
AgentApi(
agent_id: str,
base_url: str, # ws://host:port/agent_N/
callback: Optional[Callable] = None,
on_disconnect: Optional[Callable[["AgentApi"], None]] = None,
chat_id: int = 0,
)
```
**Key AgentApi facts** [VERIFIED: agent_api.py]:
- `self.url = urljoin(base_url, f"v1/agent_ws/{chat_id}/")` — builds WebSocket URL automatically from base_url + chat_id
- `await agent.connect()` — must be called before `send_message()`
- `await agent.close()` — explicit close; triggers `on_disconnect` callback, drains queue
- `async for event in agent.send_message(text, attachments=["incoming/file.pdf"])` — attachments are paths relative to `/workspace`
- `agent.id` attribute (not `agent_id`) — used as dict key in connection pool
**Lifecycle for !clear** [VERIFIED: agent_api.py `close()` + `connect()`]:
Close → triggers `on_disconnect` → removes from pool → next message recreates. Or: for an immediately-reset flow, call `close()` then `connect()` on the same instance (safe — `_connected` flag is reset in `_cleanup()`).
---
## Architecture Patterns
### Existing Code to Modify (not rewrite)
```
adapter/matrix/
agent_registry.py — extend AgentDefinition + AgentRegistry
bot.py — _build_platform_from_env, handle_invite, _materialize_incoming_attachments
routed_platform.py — _resolve_delegate (add user_agents lookup)
files.py — build_workspace_attachment_path (new path logic)
room_router.py — resolve_chat_id (chat_id=0 for DM-first, no C1/C2/C3 lookup needed)
handlers/
agent.py — DELETE or make no-op
auth.py — replace provision_workspace_chat with simple DM-accept
context_commands.py — DELETE make_handle_save, make_handle_load; keep make_handle_context
settings.py — DELETE or strip handle_settings, handle_settings_soul, etc.
__init__.py — unregister deleted commands
config/
matrix-agents.yaml — extend format
docker-compose.prod.yml — new file
.env.prod — new file (or .env.example update)
```
### Pattern 1: AgentRegistry Extension
Current `AgentDefinition` has only `agent_id` and `label`. New fields needed [VERIFIED: CONTEXT.md D-03]:
```python
# adapter/matrix/agent_registry.py
@dataclass(frozen=True)
class AgentDefinition:
agent_id: str
label: str
base_url: str # ws://lambda.coredump.ru:7000/agent_0/
workspace_path: str # /agents/0/
class AgentRegistry:
def __init__(
self,
agents: list[AgentDefinition],
user_agents: dict[str, str], # Matrix user_id -> agent_id
) -> None:
self.agents = tuple(agents)
self._by_id = {agent.agent_id: agent for agent in self.agents}
self.user_agents = user_agents # NEW
def get_agent_id_by_user(self, matrix_user_id: str) -> str | None: # NEW
return self.user_agents.get(matrix_user_id)
```
### Pattern 2: _build_platform_from_env with Per-Agent URLs
Current code uses `_agent_base_url_from_env()` globally for all delegates [VERIFIED: bot.py lines 148-161]. New pattern:
```python
def _build_platform_from_env(*, store: StateStore, chat_mgr: ChatManager) -> PlatformClient:
backend = os.environ.get("MATRIX_PLATFORM_BACKEND", "mock").strip().lower()
if backend == "real":
prototype_state = PrototypeStateStore()
registry = _load_agent_registry_from_env(required=True)
assert registry is not None
delegates = {
agent.agent_id: RealPlatformClient(
agent_id=agent.agent_id,
agent_base_url=agent.base_url, # PER-AGENT URL from config
prototype_state=prototype_state,
platform="matrix",
)
for agent in registry.agents
}
return RoutedPlatformClient(
chat_mgr=chat_mgr,
store=store,
delegates=delegates,
registry=registry, # pass registry for user_agents lookup
)
return MockPlatformClient()
```
### Pattern 3: RoutedPlatformClient._resolve_delegate (user_agents lookup)
Current implementation [VERIFIED: routed_platform.py lines 80-110] resolves agent via `room_meta.get("agent_id")` — requires the room to be pre-bound to an agent. New DM-first model: look up agent_id from `user_agents` dict by Matrix user_id.
The `_resolve_delegate` signature receives `user_id` (Matrix user_id string) and `local_chat_id` (room_id in DM-first model). New logic:
```python
async def _resolve_delegate(
self, user_id: str, local_chat_id: str
) -> tuple[PlatformClient, str]:
# 1. Look up agent_id by Matrix user_id
agent_id = self._registry.get_agent_id_by_user(user_id)
if agent_id is None:
raise PlatformError(
f"no agent configured for user: {user_id}",
code="MATRIX_USER_NOT_CONFIGURED",
)
# 2. Get delegate
delegate = self._delegates.get(agent_id)
if delegate is None:
raise PlatformError(f"unknown agent: {agent_id}", code="MATRIX_AGENT_NOT_FOUND")
# 3. chat_id=0 always (single-chat arch, D-01)
return delegate, "0"
```
### Pattern 4: DM-First Invite Handler
Replace `handle_invite` + `provision_workspace_chat` in `auth.py` [VERIFIED: auth.py lines 122-163]:
```python
async def handle_invite(client, room, event, platform, store, auth_mgr, chat_mgr) -> None:
matrix_user_id = getattr(event, "sender", "")
# Reject group rooms (non-DM) — Claude's discretion
is_dm = getattr(room, "is_direct", True) # matrix-nio: RoomCreateEvent m.room.create has is_direct
if not is_dm:
await client.room_leave(room.room_id)
return
await client.join(room.room_id)
# Check authorization
if not _is_authorized(matrix_user_id, registry): # uses user_agents lookup
await client.room_send(room.room_id, "m.room.message", {
"msgtype": "m.text",
"body": "К вашему аккаунту не привязан агент. Напишите @og_mput в Telegram для получения доступа."
})
return
# Idempotent: don't send welcome twice
meta = await get_room_meta(store, room.room_id)
if meta and meta.get("welcomed"):
return
await set_room_meta(store, room.room_id, {
"matrix_user_id": matrix_user_id,
"chat_id": "0", # single-chat: chat_id=0 always
"welcomed": True,
})
await client.room_send(room.room_id, "m.room.message", {
"msgtype": "m.text",
"body": "Привет! Я Lambda AI-агент. Просто напиши — и я отвечу. !clear чтобы начать новый разговор, !context чтобы посмотреть статус."
})
```
**Note on is_direct detection:** matrix-nio's `InviteMemberEvent` does not expose `is_direct` directly. The `MatrixRoom` object has `room_type` — DM rooms created by the client have `join_rule = "invite"` and member count 2. A safer approach: accept all invites, check `user_agents` for authorization. Group room detection is a Claude's Discretion item — the simplest implementation is to not detect it at phase 05 and only reject unauthorized users.
### Pattern 5: File Path for Incoming Attachments
Current `build_workspace_attachment_path` [VERIFIED: files.py lines 31-46] builds:
`surfaces/matrix/{safe_user}/{safe_room}/inbox/{stamp}-{filename}`
New path needed [VERIFIED: CONTEXT.md D-05]:
`incoming/{filename}` (relative), absolute: `{workspace_path}/incoming/{filename}`
New signature:
```python
def build_workspace_attachment_path(
*,
workspace_path: str, # agent's workspace_path from AgentDefinition, e.g. "/agents/0/"
filename: str,
timestamp: str | None = None,
) -> tuple[str, Path]:
"""Returns (relative_path_for_agent, absolute_path_for_download)."""
stamp = timestamp or datetime.now(UTC).strftime("%Y%m%d-%H%M%S")
safe_name = _sanitize_component(filename) or "attachment.bin"
relative_path = f"incoming/{stamp}-{safe_name}" # relative to /workspace
absolute_path = Path(workspace_path) / relative_path
return relative_path, absolute_path
```
**Callers:** `download_matrix_attachment()` in files.py and `_materialize_incoming_attachments()` in bot.py. Both need to receive `workspace_path` (from `AgentDefinition`). The bot must resolve `agent_id` for the sender before downloading — requires `registry.get_agent_id_by_user(matrix_user_id)`.
### Pattern 6: Outgoing Files (MsgEventSendFile handling)
Current `send_message` in `sdk/real.py` [VERIFIED: real.py lines 88-98] already calls `_attachment_from_send_file_event` but the result goes into `MessageResponse.attachments` — which `OutgoingMessage.attachments` then carries. The `send_outgoing()` in bot.py [VERIFIED: bot.py lines 656-686] already handles `event.attachments` by resolving `attachment.workspace_path` via `resolve_workspace_attachment_path(workspace_root, ...)`.
**Current problem:** `workspace_root` is `Path(os.environ.get("SURFACES_WORKSPACE_DIR", "/workspace"))` — a global, not per-agent. With shared volume `/agents/`, the agent workspace is `/agents/0/`, `/agents/1/`, etc.
**Fix strategy:** When processing `MsgEventSendFile(path="output/report.pdf")` for agent N, the absolute path is `/agents/N/output/report.pdf`. The `workspace_path` stored in `Attachment` (from `_attachment_from_send_file_event`) is `"output/report.pdf"`. The `workspace_root` passed to `resolve_workspace_attachment_path` must be the agent's `workspace_path` (e.g. `/agents/0/`).
**Two options:**
1. Store absolute path directly in `Attachment.workspace_path` (simplest — no env var needed)
2. Pass per-agent workspace_root through context
Option 1 is simpler: in `_attachment_from_send_file_event`, when building `Attachment`, set `workspace_path` to the absolute path (`{agent_workspace_path}/output/report.pdf`). The `resolve_workspace_attachment_path` function already handles absolute paths [VERIFIED: files.py line 87-90: `if path.is_absolute(): return path`].
This means `RealPlatformClient` needs to know the agent's `workspace_path` — pass it in constructor.
### Pattern 7: !clear Command
New handler in `context_commands.py` (or new `clear.py`):
```python
def make_handle_clear(agent_pool: dict[str, AgentApi]):
async def handle_clear(event: IncomingCommand, auth_mgr, platform, chat_mgr, settings_mgr):
# The "platform" here is RoutedPlatformClient.
# Need to access the underlying RealPlatformClient and its AgentApi.
# Two approaches:
# A) Give RoutedPlatformClient a reset_agent(user_id) method
# B) Access delegate directly via platform._delegates[agent_id]
agent_id = platform._registry.get_agent_id_by_user(event.user_id)
if agent_id and agent_id in platform._delegates:
delegate = platform._delegates[agent_id]
await delegate.reset_agent() # new method on RealPlatformClient
return [OutgoingMessage(chat_id=event.chat_id, text="Контекст сброшен.")]
return handle_clear
```
**reset_agent() on RealPlatformClient:** Close the active AgentApi connection. Since `RealPlatformClient` currently creates a fresh `AgentApi` per request (see `_build_chat_api` — no connection pool) [VERIFIED: real.py lines 173-178], there's nothing to close. The reset is implicit — the next `send_message` creates a fresh `AgentApi(chat_id="0")` which reconnects.
**However:** `chat_id="0"` is a string in `RealPlatformClient._build_chat_api` [VERIFIED: real.py line 177: `chat_id=str(chat_id)`], but `AgentApi` constructor takes `chat_id: int = 0`. The `urljoin(base_url, f"v1/agent_ws/{chat_id}/")` call will produce `v1/agent_ws/0/` regardless.
**Actual reset mechanism with current RealPlatformClient:** Since a new AgentApi is created per `send_message()` call (stateless client pattern), the "context" is held in the remote agent's `MemorySaver`. True reset = reconnect at the agent side. The `!reset` command already does `disconnect_chat` [VERIFIED: context_commands.py `make_handle_reset`]. The `!clear` can reuse this pattern: call `platform.disconnect_chat("0")` if available, or simply confirm immediately (MemorySaver resets on next connection with a fresh `chat_id` key — but chat_id=0 is always 0, so MemorySaver persists across connections).
**Implication:** True context reset with MemorySaver requires the agent to restart or use a different chat_id. For Phase 05 MVP, `!clear` can: (a) confirm to user "Контекст сброшен." and (b) note this is best-effort until agent side supports it. This matches D-11 (immediate, no confirmation dialog).
### Pattern 8: docker-compose.prod.yml
```yaml
services:
matrix-bot:
image: surfaces-bot:latest
build: .
env_file: .env.prod
volumes:
- agents:/agents/
- ./config:/app/config:ro
restart: unless-stopped
agent-0:
image: lambda-agent:latest
env_file: .env.prod
environment:
AGENT_ID: "agent-0"
volumes:
- agents:/workspace
restart: unless-stopped
volumes:
agents:
driver: local
```
**Note:** `lambda-agent:latest` is a placeholder image name per D-08. The platform team owns the actual image.
### Anti-Patterns to Avoid
- **Do not create per-request AgentApi instances in a long-running pool** — the current `RealPlatformClient` already does this correctly (stateless per request). Don't change this pattern for Phase 05.
- **Do not add chat_id logic** — single-chat arch means chat_id=0 always. Any code that increments or stores platform_chat_ids in room_meta is legacy being deleted.
- **Do not try to detect is_direct at invite time via matrix-nio** — the library's InviteMemberEvent doesn't expose this reliably. Accept all invites, authorize by user_agents lookup.
- **Do not change sdk/real.py AgentApi constructor call** — `_build_chat_api` uses `chat_id=str(chat_id)`. Keep as is; the AgentApi accepts string-coercible chat_id.
---
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| File upload to Matrix | Custom HTTP multipart | `client.upload(handle, content_type, filename, filesize)` | matrix-nio provides this; already used in bot.py send_outgoing |
| Matrix file message | Custom m.room.message | `client.room_send(room_id, "m.room.message", {"msgtype": "m.file", ...})` | Already implemented in send_outgoing |
| YAML parsing | Custom parser | `yaml.safe_load()` (already in agent_registry.py) | Already works; just extend the schema |
| WebSocket to agent | Custom aiohttp ws | `AgentApi` from external/platform-agent_api | Already used via sdk/real.py |
---
## Common Pitfalls
### Pitfall 1: `_materialize_incoming_attachments` uses global SURFACES_WORKSPACE_DIR
**What goes wrong:** Bot downloads file to `/workspace/surfaces/matrix/...` (old path) when it should write to `/agents/0/incoming/...`.
**Why it happens:** `_materialize_incoming_attachments` in bot.py [VERIFIED: bot.py line 449] reads `SURFACES_WORKSPACE_DIR` env var. In prod, this needs to be `/agents/` — but the per-user path varies.
**How to avoid:** Pass the agent's `workspace_path` (from `AgentDefinition`) into `download_matrix_attachment`. The bot must resolve `matrix_user_id → agent_id → AgentDefinition.workspace_path` before calling download. The `registry` object is available in `build_runtime()` but not currently threaded into `MatrixBot._materialize_incoming_attachments`. Either (a) store registry on `MatrixRuntime`, or (b) pass it into `MatrixBot.__init__`.
### Pitfall 2: AgentRegistry reference not available in handlers
**What goes wrong:** `handle_invite`, `_check_agent_routing`, `_materialize_incoming_attachments` all need the registry to look up user_agents. Currently registry is loaded in `build_runtime()` and passed only to `register_matrix_handlers`.
**Why it happens:** `MatrixBot` doesn't store the registry. Only the dispatcher gets it.
**How to avoid:** Store `registry: AgentRegistry | None` on `MatrixRuntime`. Thread it into `MatrixBot`.
### Pitfall 3: Existing tests test behaviors being deleted
**What goes wrong:** 35 currently failing tests (pre-existing) test Space provisioning, !agent, C1/C2/C3, !save/!load. After deletion, these tests must be deleted or replaced.
**Why it happens:** The test suite was written for the old multi-room architecture.
**How to avoid:** Plan explicitly identifies which test files to delete/rewrite:
- Delete: `test_invite_space.py`, `test_agent_handler.py`, `test_chat_space.py`
- Rewrite: `test_dispatcher.py` (large — slim to DM-first behavior), `test_routed_platform.py` (update to user_agents lookup)
- Update: `test_files.py` (new path format)
- Keep: `test_converter.py`, `test_store.py`, `test_restart_persistence.py`, `test_routing_enforcement.py`, `test_context_commands.py` (partial)
### Pitfall 4: `resolve_chat_id` returns C1/C2/C3 chat IDs
**What goes wrong:** `room_router.resolve_chat_id` [VERIFIED: room_router.py] reads `room_meta.get("chat_id")`. Old room_meta stores `"C1"`, `"C2"` etc. In DM-first model, chat_id is always `"0"`.
**How to avoid:** Update `set_room_meta` calls in the new invite handler to set `"chat_id": "0"`. The `resolve_chat_id` function can remain as-is — it will return `"0"` when that's what's stored.
### Pitfall 5: `RoutedPlatformClient._resolve_delegate` expects room_meta with agent_id
**What goes wrong:** Current `_resolve_delegate` [VERIFIED: routed_platform.py lines 80-110] reads `room_meta.get("agent_id")` — requires the room to have been pre-bound. In DM-first model with user_agents lookup, rooms are never explicitly bound.
**How to avoid:** Replace the agent_id lookup with `registry.get_agent_id_by_user(user_id)`. The `user_id` parameter is the Matrix user_id string, which is already passed into `send_message()` / `stream_message()`.
### Pitfall 6: `RealPlatformClient` needs workspace_path for outgoing file resolution
**What goes wrong:** When agent emits `MsgEventSendFile(path="output/report.pdf")`, the current `_attachment_from_send_file_event` strips `/workspace/` prefix [VERIFIED: real.py lines 207-218] leaving `"output/report.pdf"`. Then `send_outgoing` in bot.py resolves it with `SURFACES_WORKSPACE_DIR` — which doesn't know which agent's workspace to use.
**How to avoid:** Add `workspace_path: str` to `RealPlatformClient.__init__`. In `_attachment_from_send_file_event`, build absolute path: `Path(workspace_path) / event.path`. Store absolute path in `Attachment.workspace_path`. `resolve_workspace_attachment_path` already returns absolute paths unchanged [VERIFIED: files.py line 87-90].
### Pitfall 7: docker-compose.prod.yml volume mount collision
**What goes wrong:** If `/agents/` named volume is used and the agent container also mounts it as `/workspace`, all agents share the same volume root. Agent-0 writes to `/workspace/output/`, Agent-1 also writes to `/workspace/output/` — collision.
**Why it happens:** Named volume `agents` is mounted as `/workspace` in ALL agent containers.
**How to avoid:** Each agent container gets its own volume or subpath. With Docker Compose named volumes, subpath mounts are possible in Compose v2.17+ with `volume.subpath`. Or: use separate named volumes per agent (`agents_0`, `agents_1`). Or: the agent container is configured with `WORKSPACE_SUBDIR` and uses `/workspace/{agent_id}/`. Per D-08, there is one placeholder agent container — this is a platform concern. For Phase 05 with a single placeholder, use the simplest approach: one `agents` volume, agent-0 mounted at `/workspace`, bot at `/agents/`, with `workspace_path: "/agents/0/"` in config — the bot writes to `/agents/0/incoming/` which the agent reads from `/workspace/0/incoming/`. **Wait — this is a mismatch.**
**Correct topology per deploy-architecture.md** [VERIFIED: docs/deploy-architecture.md]:
- Volume `agents` mounted in bot as `/agents/`
- Volume `agents` mounted in agent-0 as `/workspace`
- Agent workspace_path in config: `/agents/0/`
- Bot writes file to `/agents/0/incoming/photo.jpg`
- Agent reads from `/workspace/0/incoming/photo.jpg` — WORKS if agent container mounts the volume at `/workspace` and the volume root contains `/0/` subdirectory.
So: one named volume, mounted identically in both containers (at `/agents/` in bot, at `/workspace` in agent). The subdirectory `/0/` is the isolation boundary. **This requires the agent container to be aware it lives in `/workspace/0/` not `/workspace/`.** This is a platform concern. For Phase 05 single-agent placeholder, this still works because there's only one agent.
---
## Code Examples
### AgentApi usage (verified from source)
```python
# Source: external/platform-agent_api/lambda_agent_api/agent_api.py
agent = AgentApi(
agent_id="agent-0",
base_url="ws://lambda.coredump.ru:7000/agent_0/",
on_disconnect=lambda a: connected_agents.pop(a.id, None),
chat_id=0,
)
await agent.connect() # Must call before send_message
async for event in agent.send_message("Hello", attachments=["incoming/photo.jpg"]):
if isinstance(event, MsgEventTextChunk):
print(event.text)
elif isinstance(event, MsgEventSendFile):
# event.path = "output/report.pdf"
abs_path = Path(agent_workspace_path) / event.path
await agent.close() # Triggers on_disconnect
```
### Matrix file upload (verified from bot.py)
```python
# Source: adapter/matrix/bot.py send_outgoing()
with file_path.open("rb") as handle:
upload_response, _ = await client.upload(
handle,
content_type=attachment.mime_type or "application/octet-stream",
filename=attachment.filename or file_path.name,
filesize=file_path.stat().st_size,
)
content_uri = upload_response.content_uri
await client.room_send(room_id, "m.room.message", {
"msgtype": "m.file", # or m.image, m.audio, m.video
"body": filename,
"url": content_uri,
})
```
### YAML config extension (target format)
```yaml
# config/matrix-agents.yaml (new format per D-02/D-03)
user_agents:
"@user0:matrix.lambda.coredump.ru": agent-0
"@user1:matrix.lambda.coredump.ru": agent-1
agents:
- id: agent-0
label: "Agent 0"
base_url: "ws://lambda.coredump.ru:7000/agent_0/"
workspace_path: "/agents/0/"
- id: agent-1
label: "Agent 1"
base_url: "ws://lambda.coredump.ru:7000/agent_1/"
workspace_path: "/agents/1/"
```
---
## Runtime State Inventory
> Phase includes refactoring but NOT renaming of string identifiers in user-facing data. Users interacting with the old multi-room bot will have SQLite room_meta records with old schema keys.
| Category | Items Found | Action Required |
|----------|-------------|------------------|
| Stored data (SQLite) | `lambda_matrix.db` (dev). Room meta records contain `chat_id: "C1"`, `space_id`, `redirect_room_id`, `agent_id` — from old multi-room flow. | No migration. D-05 says: ignore existing Space+rooms, do not migrate. New users get DM-first. Old users' DM rooms will lack `welcomed` key — first message in DM room triggers normal message dispatch path (acceptable). |
| Stored data (SQLite) | `selected_agent_id` key in user metadata — written by `!agent` command being deleted. | No migration needed. `!agent` is gone. The new routing uses `user_agents` from YAML config. Old `selected_agent_id` values are orphaned but harmless. |
| Live service config | No external services with stored config (no n8n, no Datadog). | None. |
| OS-registered state | None. Bot runs in Docker, no launchd/systemd registration. | None. |
| Secrets/env vars | `AGENT_BASE_URL` (global) → replaced by per-agent `base_url` in YAML. `SURFACES_WORKSPACE_DIR` (global workspace) → per-agent `workspace_path` from YAML. Both env vars become deprecated for prod but remain for backward compat in dev. | Update `.env.example`. Add `.env.prod` template. |
| Build artifacts | None in prod context. Local: `.venv`, `__pycache__` — unaffected. | None. |
---
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | pytest 9.0.2 + pytest-asyncio |
| Config file | `pyproject.toml` (`asyncio_mode = "auto"`) |
| Quick run command | `uv run pytest tests/adapter/matrix/ -q` |
| Full suite command | `uv run pytest tests/ -q` |
### Current Test Status (pre-Phase-05)
| File | Status | Disposition in Phase 05 |
|------|--------|-------------------------|
| test_converter.py | 14 passing | Keep as-is |
| test_files.py | 2 passing | Update for new path format |
| test_reactions.py | 2 passing | Keep as-is |
| test_restart_persistence.py | 5 passing | Keep; update if routing logic changes |
| test_routing_enforcement.py | 5 passing | Update for user_agents routing model |
| test_store.py | 2 passing | Keep as-is |
| test_agent_handler.py | failing (import?) | DELETE — !agent is deleted |
| test_agent_registry.py | failing (import?) | REWRITE — test new AgentDefinition schema |
| test_chat_space.py | failing | DELETE — Space provisioning deleted |
| test_confirm.py | failing | Keep or update |
| test_context_commands.py | 4 failing | REWRITE — !save/!load deleted; keep !context, add !clear |
| test_dispatcher.py | 20 failing | REWRITE — DM-first flow replaces multi-room |
| test_invite_space.py | 3 failing | DELETE and REPLACE with DM-first invite tests |
| test_routed_platform.py | 1 failing | REWRITE — user_agents lookup replaces room binding |
| test_send_outgoing.py | failing | REWRITE — per-agent workspace_path |
### Phase Requirements → Test Map
| Behavior | Test Type | Automated Command | Wave |
|----------|-----------|-------------------|------|
| AgentRegistry parses new YAML format (user_agents + base_url/workspace_path) | unit | `uv run pytest tests/adapter/matrix/test_agent_registry.py -x` | Wave 1 |
| Unauthorized user gets access-denied message on invite | unit | `uv run pytest tests/adapter/matrix/test_invite_dm.py -x` | Wave 2 |
| Authorized user gets welcome on DM invite | unit | `uv run pytest tests/adapter/matrix/test_invite_dm.py -x` | Wave 2 |
| Message from authorized user routes to correct delegate | unit | `uv run pytest tests/adapter/matrix/test_routed_platform.py -x` | Wave 2 |
| Incoming file saved to `incoming/{filename}` under agent workspace | unit | `uv run pytest tests/adapter/matrix/test_files.py -x` | Wave 3 |
| !clear command returns "Контекст сброшен." | unit | `uv run pytest tests/adapter/matrix/test_context_commands.py -x` | Wave 2 |
| Full suite green | integration | `uv run pytest tests/ -q` | Phase gate |
### Wave 0 Gaps
- [ ] `tests/adapter/matrix/test_invite_dm.py` — DM-first invite flow (new file)
- [ ] Updated `tests/adapter/matrix/test_agent_registry.py` — new schema
*(All other existing test infrastructure is in place. No new framework install needed.)*
---
## Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| uv / Python 3.11 | tests, bot run | ✓ | Python 3.11.9, pytest 9.0.2 | — |
| Docker | docker-compose.prod.yml | ✓ (assumed dev machine) | — | Manual install |
| matrix-nio | Matrix adapter | ✓ | installed in .venv | — |
| pyyaml | agent_registry.py | ✓ | installed (yaml import works in bot context) | — |
| lambda-agent:latest image | docker-compose.prod.yml | ✗ | placeholder — platform team owns | Use `build: ./external/platform-agent` for local testing |
**Missing dependencies with no fallback:**
- `lambda-agent:latest` — docker-compose.prod.yml uses this as placeholder image. For actual testing, use `build: ./external/platform-agent` fallback or `image: busybox` stub.
---
## Open Questions
1. **is_direct detection for group room rejection (D-05, Claude's Discretion)**
- What we know: matrix-nio's `InviteMemberEvent` does not expose `is_direct` flag directly. The `MatrixRoom` type has member count accessible via `room.member_count` or `room.joined_members`.
- What's unclear: Whether InviteMemberEvent or MatrixRoom in nio exposes enough to reliably detect DM vs. group at invite time.
- Recommendation: At Phase 05, accept all invites and immediately check user_agents authorization. Non-DM group rooms where the bot is invited by an authorized user will also work (no harm). Add `room.member_count <= 2` check if desired.
2. **True !clear semantics with MemorySaver**
- What we know: `RealPlatformClient._build_chat_api` creates a new `AgentApi(chat_id="0")` per request. The agent's `MemorySaver` is keyed by `chat_id` — always `"0"`. So context is NOT cleared by reconnecting.
- What's unclear: Whether `!clear` should work "for real" (requires platform to support a reset endpoint or different chat_id) or just show a user-facing message (MVP-acceptable).
- Recommendation: Phase 05 sends "Контекст сброшен." immediately (D-11). Document the limitation. Actual context reset is a platform concern.
3. **lambda-agent:latest image name**
- What we know: D-08 says "placeholder image `lambda-agent:latest` — уточнить у Азамата".
- Recommendation: Use `lambda-agent:latest` as image name in docker-compose.prod.yml. Add a comment indicating it's a placeholder. Provide `build:` fallback pointing to `./external/platform-agent` for local dev validation.
---
## Assumptions Log
| # | Claim | Section | Risk if Wrong |
|---|-------|---------|---------------|
| A1 | `lambda-agent:latest` is the agreed image name for the agent container | docker-compose section | docker-compose.prod.yml won't work; easy to fix by updating image name |
| A2 | Group room invite detection is not required for Phase 05 (DM-first only means "start in DM", not "reject group invites") | DM-first onboarding | If group room rejection IS required, need to investigate matrix-nio InviteMemberEvent structure |
| A3 | !clear in Phase 05 is cosmetic (shows "cleared" but MemorySaver persists until agent restart) | !clear section | User confusion if they expect real context reset |
---
## Project Constraints (from CLAUDE.md)
| Directive | Implication for Phase 05 |
|-----------|--------------------------|
| Вызовы платформы — через `platform/interface.py` (Protocol) | RealPlatformClient stays the SDK boundary; AgentApi is internal to sdk/ |
| При подключении реального SDK — меняем только `platform/mock.py` | Phase 05 touches `sdk/real.py` for workspace_path — acceptable, it's a refinement not a rewrite |
| Хотфиксы (< 20 строк) Claude Code напрямую, не Codex | Phase 05 is >20 lines; must go through Codex via GSD |
| Реализацию делает codex:rescue | Plans must be PLAN.md format passable to Codex |
| Никогда не коммить `.env` | `.env.prod` must be in `.gitignore` — only `.env.prod.example` is committed |
| `uv sync` для зависимостей | No new pip installs; all deps already in pyproject.toml |
| pytest tests/ для тестов | Phase gate: `uv run pytest tests/ -q` must be green |
---
## Sources
### Primary (HIGH confidence)
- [VERIFIED: adapter/matrix/agent_registry.py] — current AgentDefinition/AgentRegistry structure
- [VERIFIED: adapter/matrix/bot.py] — _build_platform_from_env, MatrixBot, handle_invite, _materialize_incoming_attachments
- [VERIFIED: adapter/matrix/routed_platform.py] — _resolve_delegate logic
- [VERIFIED: adapter/matrix/files.py] — build_workspace_attachment_path, download_matrix_attachment
- [VERIFIED: adapter/matrix/handlers/agent.py] — !agent handler (to be deleted)
- [VERIFIED: adapter/matrix/handlers/auth.py] — provision_workspace_chat (to be replaced)
- [VERIFIED: adapter/matrix/handlers/context_commands.py] — !save/!load/!reset handlers
- [VERIFIED: adapter/matrix/handlers/__init__.py] — handler registration
- [VERIFIED: sdk/real.py] — RealPlatformClient, _build_chat_api, _attachment_from_send_file_event
- [VERIFIED: sdk/upstream_agent_api.py] — sys.path patching, AgentApi import
- [VERIFIED: external/platform-agent_api/lambda_agent_api/agent_api.py] — actual AgentApi implementation
- [VERIFIED: config/matrix-agents.yaml] — current format
- [VERIFIED: docker-compose.yml] — existing dev compose topology
- [VERIFIED: .env.example] — current env var set
- [VERIFIED: docs/deploy-architecture.md] — prod topology spec
- [VERIFIED: .planning/phases/05-mvp-deployment/05-CONTEXT.md] — locked decisions
### Secondary (MEDIUM confidence)
- [ASSUMED: A1] lambda-agent image name — from CONTEXT.md D-08 description
- [ASSUMED: A2] Group room handling scope — inferred from D-05 wording
---
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — all libraries verified in existing code
- Architecture patterns: HIGH — all patterns verified against actual source files
- Pitfalls: HIGH — all pitfalls derived from reading actual code, not from training assumptions
- Test strategy: HIGH — test files enumerated and statuses verified by running pytest
**Research date:** 2026-04-27
**Valid until:** 2026-05-27 (stable codebase; short-circuit if platform-agent_api changes)