surfaces/.planning/phases/05-mvp-deployment/05-RESEARCH.md

39 KiB
Raw Blame History

Phase 05: MVP Deployment — Research

Researched: 2026-04-27 Domain: Matrix bot deployment — config refactor, DM-first onboarding, file transfer, docker-compose prod topology Confidence: HIGH (all findings verified against actual codebase)


<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Single-chat architecture

  • D-01: chat_id=0 for all messages. One agent context per user. !clear resets context.
  • D-02: Delete all multi-room infrastructure: C1/C2/C3, !new, !archive, !rename, Space-creation, room provisioning. Matrix bot operates only in DM room.
  • D-03: Delete !save and !load — unreliable without persistent memory in agent.

Onboarding (DM-first)

  • D-04: On DM invite — accept, send welcome: "Привет! Я Lambda AI-агент. Просто напиши — и я отвечу. !clear чтобы начать новый разговор, !context чтобы посмотреть статус."
  • D-05: No Space, no child rooms. All conversation in one DM room.

!clear (new command)

  • D-06: Reset agent context — close current AgentApi connection and create new (await agent.close() + await agent.connect()). Confirm: "Контекст сброшен. Начнём с чистого листа."
  • D-11: No confirmation dialog — immediate reset.

!agent command

  • D-07: Delete completely. user→agent mapping is static from config.

Agent config (config/matrix-agents.yaml)

  • D-02 (config): Extend current matrix-agents.yaml — add user_agents dict and base_url/workspace_path fields per agent.
  • D-03 (schema): AgentDefinition gains base_url: str and workspace_path: str. AgentRegistry adds user_agents: dict[matrix_user_id, agent_id] and get_agent_id_by_user(matrix_user_id).

Routing user → agent in _build_platform_from_env

  • D-04 (routing): Per-agent URL from config instead of global AGENT_BASE_URL. _build_platform_from_env builds delegates with correct base_url per agent. RoutedPlatformClient._resolve_delegate uses user_agents from registry.

Incoming files (user → agent)

  • D-05 (files): Path inside agent workspace: incoming/{filename}. Absolute: {workspace_path}/incoming/{filename}. Update files.py: build_workspace_attachment_path takes agent workspace_path and builds incoming/{filename}. Pass to agent.send_message() as attachments=["incoming/{filename}"] (relative to /workspace).
  • D-06 (files): workspace_path is taken from AgentDefinition by user's agent_id.

Outgoing files (agent → user)

  • D-07 (files): On MsgEventSendFile(path="output/report.pdf") — read from {workspace_path}/{path}. Send as Matrix file message.

docker-compose for prod

  • D-08: docker-compose.prod.yml includes: Matrix bot + agent container (placeholder image lambda-agent:latest) + named volume agents.
  • D-09: Named volume agents mounted in Matrix bot as /agents/ and in agent container as /workspace. Env vars from .env.prod. Start: docker compose -f docker-compose.prod.yml up.

Unauthorized users

  • D-10: If Matrix user_id not in user_agents — accept invite, reply: "К вашему аккаунту не привязан агент. Напишите @og_mput в Telegram для получения доступа." Ignore further messages (or repeat message).

!settings and other settings commands

  • D-12: Delete !settings, !settings soul, !settings skills, !settings safety.

Claude's Discretion

  • MATRIX_AGENT_REGISTRY_PATH — keep as env var for config path (already exists)
  • Format of .env.prod
  • Group room invites (non-DM) — reject automatically
  • Existing Space+rooms for old users — ignore, do not migrate

Deferred Ideas (OUT OF SCOPE)

  • platform-master integration (dynamic get_agent_url via POST /api/v1/create) — when feat/storage is ready
  • !agent as admin-override — not needed for MVP
  • Per-chat context isolation via different chat_id (currently chat_id=0) — waiting for platform signal </user_constraints>

Summary

Phase 05 is a code-and-config refactor of the existing Matrix adapter. There is no new framework to learn — the full stack (matrix-nio, AgentApi, docker-compose) is already in use. The work is: (1) simplify the data model from multi-room to single DM room per user, (2) extend AgentRegistry with per-user routing and per-agent URLs/paths, (3) reroute file I/O to the shared /agents/ volume, (4) write a prod docker-compose, and (5) delete substantial legacy code (Space provisioning, C1/C2/C3, !agent, !save, !load, !settings).

The current codebase has 35 failing tests (pre-existing on feat/deploy), mostly in test_dispatcher.py, test_invite_space.py, test_routed_platform.py — all testing behaviors that Phase 05 will delete or replace. New tests must cover the simplified DM-first invite flow, the user_agents lookup path, and the new file path logic. Existing passing tests (203) must stay green.

Primary recommendation: Execute as three sequential mini-plans: (A) config/registry extension + routing, (B) DM-first onboarding + !clear + legacy deletion, (C) file transfer + docker-compose.prod.yml + .env.prod.


Standard Stack

All libraries are already installed and in use. No new dependencies.

Core (already in pyproject.toml)

Library Version Purpose Source
matrix-nio installed Matrix client — join rooms, send messages, upload files [VERIFIED: adapter/matrix/bot.py imports]
pyyaml installed YAML config parsing in AgentRegistry [VERIFIED: agent_registry.py line 7]
aiohttp installed WebSocket transport inside AgentApi [VERIFIED: external/platform-agent_api/lambda_agent_api/agent_api.py]
structlog installed Structured logging [VERIFIED: bot.py imports]
python-dotenv installed .env loading [VERIFIED: bot.py line 79]

AgentApi (external, local path)

external/platform-agent_api/lambda_agent_api/agent_api.py — imported via sdk/upstream_agent_api.py which patches sys.path.

Verified constructor signature [VERIFIED: agent_api.py]:

AgentApi(
    agent_id: str,
    base_url: str,               # ws://host:port/agent_N/
    callback: Optional[Callable] = None,
    on_disconnect: Optional[Callable[["AgentApi"], None]] = None,
    chat_id: int = 0,
)

Key AgentApi facts [VERIFIED: agent_api.py]:

  • self.url = urljoin(base_url, f"v1/agent_ws/{chat_id}/") — builds WebSocket URL automatically from base_url + chat_id
  • await agent.connect() — must be called before send_message()
  • await agent.close() — explicit close; triggers on_disconnect callback, drains queue
  • async for event in agent.send_message(text, attachments=["incoming/file.pdf"]) — attachments are paths relative to /workspace
  • agent.id attribute (not agent_id) — used as dict key in connection pool

Lifecycle for !clear [VERIFIED: agent_api.py close() + connect()]: Close → triggers on_disconnect → removes from pool → next message recreates. Or: for an immediately-reset flow, call close() then connect() on the same instance (safe — _connected flag is reset in _cleanup()).


Architecture Patterns

Existing Code to Modify (not rewrite)

adapter/matrix/
  agent_registry.py    — extend AgentDefinition + AgentRegistry
  bot.py               — _build_platform_from_env, handle_invite, _materialize_incoming_attachments
  routed_platform.py   — _resolve_delegate (add user_agents lookup)
  files.py             — build_workspace_attachment_path (new path logic)
  room_router.py       — resolve_chat_id (chat_id=0 for DM-first, no C1/C2/C3 lookup needed)
  handlers/
    agent.py           — DELETE or make no-op
    auth.py            — replace provision_workspace_chat with simple DM-accept
    context_commands.py — DELETE make_handle_save, make_handle_load; keep make_handle_context
    settings.py        — DELETE or strip handle_settings, handle_settings_soul, etc.
    __init__.py        — unregister deleted commands

config/
  matrix-agents.yaml   — extend format

docker-compose.prod.yml  — new file
.env.prod               — new file (or .env.example update)

Pattern 1: AgentRegistry Extension

Current AgentDefinition has only agent_id and label. New fields needed [VERIFIED: CONTEXT.md D-03]:

# adapter/matrix/agent_registry.py

@dataclass(frozen=True)
class AgentDefinition:
    agent_id: str
    label: str
    base_url: str          # ws://lambda.coredump.ru:7000/agent_0/
    workspace_path: str    # /agents/0/


class AgentRegistry:
    def __init__(
        self,
        agents: list[AgentDefinition],
        user_agents: dict[str, str],   # Matrix user_id -> agent_id
    ) -> None:
        self.agents = tuple(agents)
        self._by_id = {agent.agent_id: agent for agent in self.agents}
        self.user_agents = user_agents   # NEW

    def get_agent_id_by_user(self, matrix_user_id: str) -> str | None:  # NEW
        return self.user_agents.get(matrix_user_id)

Pattern 2: _build_platform_from_env with Per-Agent URLs

Current code uses _agent_base_url_from_env() globally for all delegates [VERIFIED: bot.py lines 148-161]. New pattern:

def _build_platform_from_env(*, store: StateStore, chat_mgr: ChatManager) -> PlatformClient:
    backend = os.environ.get("MATRIX_PLATFORM_BACKEND", "mock").strip().lower()
    if backend == "real":
        prototype_state = PrototypeStateStore()
        registry = _load_agent_registry_from_env(required=True)
        assert registry is not None
        delegates = {
            agent.agent_id: RealPlatformClient(
                agent_id=agent.agent_id,
                agent_base_url=agent.base_url,     # PER-AGENT URL from config
                prototype_state=prototype_state,
                platform="matrix",
            )
            for agent in registry.agents
        }
        return RoutedPlatformClient(
            chat_mgr=chat_mgr,
            store=store,
            delegates=delegates,
            registry=registry,       # pass registry for user_agents lookup
        )
    return MockPlatformClient()

Pattern 3: RoutedPlatformClient._resolve_delegate (user_agents lookup)

Current implementation [VERIFIED: routed_platform.py lines 80-110] resolves agent via room_meta.get("agent_id") — requires the room to be pre-bound to an agent. New DM-first model: look up agent_id from user_agents dict by Matrix user_id.

The _resolve_delegate signature receives user_id (Matrix user_id string) and local_chat_id (room_id in DM-first model). New logic:

async def _resolve_delegate(
    self, user_id: str, local_chat_id: str
) -> tuple[PlatformClient, str]:
    # 1. Look up agent_id by Matrix user_id
    agent_id = self._registry.get_agent_id_by_user(user_id)
    if agent_id is None:
        raise PlatformError(
            f"no agent configured for user: {user_id}",
            code="MATRIX_USER_NOT_CONFIGURED",
        )
    # 2. Get delegate
    delegate = self._delegates.get(agent_id)
    if delegate is None:
        raise PlatformError(f"unknown agent: {agent_id}", code="MATRIX_AGENT_NOT_FOUND")
    # 3. chat_id=0 always (single-chat arch, D-01)
    return delegate, "0"

Pattern 4: DM-First Invite Handler

Replace handle_invite + provision_workspace_chat in auth.py [VERIFIED: auth.py lines 122-163]:

async def handle_invite(client, room, event, platform, store, auth_mgr, chat_mgr) -> None:
    matrix_user_id = getattr(event, "sender", "")
    # Reject group rooms (non-DM) — Claude's discretion
    is_dm = getattr(room, "is_direct", True)   # matrix-nio: RoomCreateEvent m.room.create has is_direct
    if not is_dm:
        await client.room_leave(room.room_id)
        return

    await client.join(room.room_id)

    # Check authorization
    if not _is_authorized(matrix_user_id, registry):  # uses user_agents lookup
        await client.room_send(room.room_id, "m.room.message", {
            "msgtype": "m.text",
            "body": "К вашему аккаунту не привязан агент. Напишите @og_mput в Telegram для получения доступа."
        })
        return

    # Idempotent: don't send welcome twice
    meta = await get_room_meta(store, room.room_id)
    if meta and meta.get("welcomed"):
        return

    await set_room_meta(store, room.room_id, {
        "matrix_user_id": matrix_user_id,
        "chat_id": "0",          # single-chat: chat_id=0 always
        "welcomed": True,
    })
    await client.room_send(room.room_id, "m.room.message", {
        "msgtype": "m.text",
        "body": "Привет! Я Lambda AI-агент. Просто напиши — и я отвечу. !clear чтобы начать новый разговор, !context чтобы посмотреть статус."
    })

Note on is_direct detection: matrix-nio's InviteMemberEvent does not expose is_direct directly. The MatrixRoom object has room_type — DM rooms created by the client have join_rule = "invite" and member count 2. A safer approach: accept all invites, check user_agents for authorization. Group room detection is a Claude's Discretion item — the simplest implementation is to not detect it at phase 05 and only reject unauthorized users.

Pattern 5: File Path for Incoming Attachments

Current build_workspace_attachment_path [VERIFIED: files.py lines 31-46] builds: surfaces/matrix/{safe_user}/{safe_room}/inbox/{stamp}-{filename}

New path needed [VERIFIED: CONTEXT.md D-05]: incoming/{filename} (relative), absolute: {workspace_path}/incoming/{filename}

New signature:

def build_workspace_attachment_path(
    *,
    workspace_path: str,    # agent's workspace_path from AgentDefinition, e.g. "/agents/0/"
    filename: str,
    timestamp: str | None = None,
) -> tuple[str, Path]:
    """Returns (relative_path_for_agent, absolute_path_for_download)."""
    stamp = timestamp or datetime.now(UTC).strftime("%Y%m%d-%H%M%S")
    safe_name = _sanitize_component(filename) or "attachment.bin"
    relative_path = f"incoming/{stamp}-{safe_name}"    # relative to /workspace
    absolute_path = Path(workspace_path) / relative_path
    return relative_path, absolute_path

Callers: download_matrix_attachment() in files.py and _materialize_incoming_attachments() in bot.py. Both need to receive workspace_path (from AgentDefinition). The bot must resolve agent_id for the sender before downloading — requires registry.get_agent_id_by_user(matrix_user_id).

Pattern 6: Outgoing Files (MsgEventSendFile handling)

Current send_message in sdk/real.py [VERIFIED: real.py lines 88-98] already calls _attachment_from_send_file_event but the result goes into MessageResponse.attachments — which OutgoingMessage.attachments then carries. The send_outgoing() in bot.py [VERIFIED: bot.py lines 656-686] already handles event.attachments by resolving attachment.workspace_path via resolve_workspace_attachment_path(workspace_root, ...).

Current problem: workspace_root is Path(os.environ.get("SURFACES_WORKSPACE_DIR", "/workspace")) — a global, not per-agent. With shared volume /agents/, the agent workspace is /agents/0/, /agents/1/, etc.

Fix strategy: When processing MsgEventSendFile(path="output/report.pdf") for agent N, the absolute path is /agents/N/output/report.pdf. The workspace_path stored in Attachment (from _attachment_from_send_file_event) is "output/report.pdf". The workspace_root passed to resolve_workspace_attachment_path must be the agent's workspace_path (e.g. /agents/0/).

Two options:

  1. Store absolute path directly in Attachment.workspace_path (simplest — no env var needed)
  2. Pass per-agent workspace_root through context

Option 1 is simpler: in _attachment_from_send_file_event, when building Attachment, set workspace_path to the absolute path ({agent_workspace_path}/output/report.pdf). The resolve_workspace_attachment_path function already handles absolute paths [VERIFIED: files.py line 87-90: if path.is_absolute(): return path].

This means RealPlatformClient needs to know the agent's workspace_path — pass it in constructor.

Pattern 7: !clear Command

New handler in context_commands.py (or new clear.py):

def make_handle_clear(agent_pool: dict[str, AgentApi]):
    async def handle_clear(event: IncomingCommand, auth_mgr, platform, chat_mgr, settings_mgr):
        # The "platform" here is RoutedPlatformClient.
        # Need to access the underlying RealPlatformClient and its AgentApi.
        # Two approaches:
        # A) Give RoutedPlatformClient a reset_agent(user_id) method
        # B) Access delegate directly via platform._delegates[agent_id]
        agent_id = platform._registry.get_agent_id_by_user(event.user_id)
        if agent_id and agent_id in platform._delegates:
            delegate = platform._delegates[agent_id]
            await delegate.reset_agent()   # new method on RealPlatformClient
        return [OutgoingMessage(chat_id=event.chat_id, text="Контекст сброшен.")]
    return handle_clear

reset_agent() on RealPlatformClient: Close the active AgentApi connection. Since RealPlatformClient currently creates a fresh AgentApi per request (see _build_chat_api — no connection pool) [VERIFIED: real.py lines 173-178], there's nothing to close. The reset is implicit — the next send_message creates a fresh AgentApi(chat_id="0") which reconnects.

However: chat_id="0" is a string in RealPlatformClient._build_chat_api [VERIFIED: real.py line 177: chat_id=str(chat_id)], but AgentApi constructor takes chat_id: int = 0. The urljoin(base_url, f"v1/agent_ws/{chat_id}/") call will produce v1/agent_ws/0/ regardless.

Actual reset mechanism with current RealPlatformClient: Since a new AgentApi is created per send_message() call (stateless client pattern), the "context" is held in the remote agent's MemorySaver. True reset = reconnect at the agent side. The !reset command already does disconnect_chat [VERIFIED: context_commands.py make_handle_reset]. The !clear can reuse this pattern: call platform.disconnect_chat("0") if available, or simply confirm immediately (MemorySaver resets on next connection with a fresh chat_id key — but chat_id=0 is always 0, so MemorySaver persists across connections).

Implication: True context reset with MemorySaver requires the agent to restart or use a different chat_id. For Phase 05 MVP, !clear can: (a) confirm to user "Контекст сброшен." and (b) note this is best-effort until agent side supports it. This matches D-11 (immediate, no confirmation dialog).

Pattern 8: docker-compose.prod.yml

services:
  matrix-bot:
    image: surfaces-bot:latest
    build: .
    env_file: .env.prod
    volumes:
      - agents:/agents/
      - ./config:/app/config:ro
    restart: unless-stopped

  agent-0:
    image: lambda-agent:latest
    env_file: .env.prod
    environment:
      AGENT_ID: "agent-0"
    volumes:
      - agents:/workspace
    restart: unless-stopped

volumes:
  agents:
    driver: local

Note: lambda-agent:latest is a placeholder image name per D-08. The platform team owns the actual image.

Anti-Patterns to Avoid

  • Do not create per-request AgentApi instances in a long-running pool — the current RealPlatformClient already does this correctly (stateless per request). Don't change this pattern for Phase 05.
  • Do not add chat_id logic — single-chat arch means chat_id=0 always. Any code that increments or stores platform_chat_ids in room_meta is legacy being deleted.
  • Do not try to detect is_direct at invite time via matrix-nio — the library's InviteMemberEvent doesn't expose this reliably. Accept all invites, authorize by user_agents lookup.
  • Do not change sdk/real.py AgentApi constructor call_build_chat_api uses chat_id=str(chat_id). Keep as is; the AgentApi accepts string-coercible chat_id.

Don't Hand-Roll

Problem Don't Build Use Instead Why
File upload to Matrix Custom HTTP multipart client.upload(handle, content_type, filename, filesize) matrix-nio provides this; already used in bot.py send_outgoing
Matrix file message Custom m.room.message client.room_send(room_id, "m.room.message", {"msgtype": "m.file", ...}) Already implemented in send_outgoing
YAML parsing Custom parser yaml.safe_load() (already in agent_registry.py) Already works; just extend the schema
WebSocket to agent Custom aiohttp ws AgentApi from external/platform-agent_api Already used via sdk/real.py

Common Pitfalls

Pitfall 1: _materialize_incoming_attachments uses global SURFACES_WORKSPACE_DIR

What goes wrong: Bot downloads file to /workspace/surfaces/matrix/... (old path) when it should write to /agents/0/incoming/.... Why it happens: _materialize_incoming_attachments in bot.py [VERIFIED: bot.py line 449] reads SURFACES_WORKSPACE_DIR env var. In prod, this needs to be /agents/ — but the per-user path varies. How to avoid: Pass the agent's workspace_path (from AgentDefinition) into download_matrix_attachment. The bot must resolve matrix_user_id → agent_id → AgentDefinition.workspace_path before calling download. The registry object is available in build_runtime() but not currently threaded into MatrixBot._materialize_incoming_attachments. Either (a) store registry on MatrixRuntime, or (b) pass it into MatrixBot.__init__.

Pitfall 2: AgentRegistry reference not available in handlers

What goes wrong: handle_invite, _check_agent_routing, _materialize_incoming_attachments all need the registry to look up user_agents. Currently registry is loaded in build_runtime() and passed only to register_matrix_handlers. Why it happens: MatrixBot doesn't store the registry. Only the dispatcher gets it. How to avoid: Store registry: AgentRegistry | None on MatrixRuntime. Thread it into MatrixBot.

Pitfall 3: Existing tests test behaviors being deleted

What goes wrong: 35 currently failing tests (pre-existing) test Space provisioning, !agent, C1/C2/C3, !save/!load. After deletion, these tests must be deleted or replaced. Why it happens: The test suite was written for the old multi-room architecture. How to avoid: Plan explicitly identifies which test files to delete/rewrite:

  • Delete: test_invite_space.py, test_agent_handler.py, test_chat_space.py
  • Rewrite: test_dispatcher.py (large — slim to DM-first behavior), test_routed_platform.py (update to user_agents lookup)
  • Update: test_files.py (new path format)
  • Keep: test_converter.py, test_store.py, test_restart_persistence.py, test_routing_enforcement.py, test_context_commands.py (partial)

Pitfall 4: resolve_chat_id returns C1/C2/C3 chat IDs

What goes wrong: room_router.resolve_chat_id [VERIFIED: room_router.py] reads room_meta.get("chat_id"). Old room_meta stores "C1", "C2" etc. In DM-first model, chat_id is always "0". How to avoid: Update set_room_meta calls in the new invite handler to set "chat_id": "0". The resolve_chat_id function can remain as-is — it will return "0" when that's what's stored.

Pitfall 5: RoutedPlatformClient._resolve_delegate expects room_meta with agent_id

What goes wrong: Current _resolve_delegate [VERIFIED: routed_platform.py lines 80-110] reads room_meta.get("agent_id") — requires the room to have been pre-bound. In DM-first model with user_agents lookup, rooms are never explicitly bound. How to avoid: Replace the agent_id lookup with registry.get_agent_id_by_user(user_id). The user_id parameter is the Matrix user_id string, which is already passed into send_message() / stream_message().

Pitfall 6: RealPlatformClient needs workspace_path for outgoing file resolution

What goes wrong: When agent emits MsgEventSendFile(path="output/report.pdf"), the current _attachment_from_send_file_event strips /workspace/ prefix [VERIFIED: real.py lines 207-218] leaving "output/report.pdf". Then send_outgoing in bot.py resolves it with SURFACES_WORKSPACE_DIR — which doesn't know which agent's workspace to use. How to avoid: Add workspace_path: str to RealPlatformClient.__init__. In _attachment_from_send_file_event, build absolute path: Path(workspace_path) / event.path. Store absolute path in Attachment.workspace_path. resolve_workspace_attachment_path already returns absolute paths unchanged [VERIFIED: files.py line 87-90].

Pitfall 7: docker-compose.prod.yml volume mount collision

What goes wrong: If /agents/ named volume is used and the agent container also mounts it as /workspace, all agents share the same volume root. Agent-0 writes to /workspace/output/, Agent-1 also writes to /workspace/output/ — collision. Why it happens: Named volume agents is mounted as /workspace in ALL agent containers. How to avoid: Each agent container gets its own volume or subpath. With Docker Compose named volumes, subpath mounts are possible in Compose v2.17+ with volume.subpath. Or: use separate named volumes per agent (agents_0, agents_1). Or: the agent container is configured with WORKSPACE_SUBDIR and uses /workspace/{agent_id}/. Per D-08, there is one placeholder agent container — this is a platform concern. For Phase 05 with a single placeholder, use the simplest approach: one agents volume, agent-0 mounted at /workspace, bot at /agents/, with workspace_path: "/agents/0/" in config — the bot writes to /agents/0/incoming/ which the agent reads from /workspace/0/incoming/. Wait — this is a mismatch.

Correct topology per deploy-architecture.md [VERIFIED: docs/deploy-architecture.md]:

  • Volume agents mounted in bot as /agents/
  • Volume agents mounted in agent-0 as /workspace
  • Agent workspace_path in config: /agents/0/
  • Bot writes file to /agents/0/incoming/photo.jpg
  • Agent reads from /workspace/0/incoming/photo.jpg — WORKS if agent container mounts the volume at /workspace and the volume root contains /0/ subdirectory.

So: one named volume, mounted identically in both containers (at /agents/ in bot, at /workspace in agent). The subdirectory /0/ is the isolation boundary. This requires the agent container to be aware it lives in /workspace/0/ not /workspace/. This is a platform concern. For Phase 05 single-agent placeholder, this still works because there's only one agent.


Code Examples

AgentApi usage (verified from source)

# Source: external/platform-agent_api/lambda_agent_api/agent_api.py

agent = AgentApi(
    agent_id="agent-0",
    base_url="ws://lambda.coredump.ru:7000/agent_0/",
    on_disconnect=lambda a: connected_agents.pop(a.id, None),
    chat_id=0,
)
await agent.connect()   # Must call before send_message

async for event in agent.send_message("Hello", attachments=["incoming/photo.jpg"]):
    if isinstance(event, MsgEventTextChunk):
        print(event.text)
    elif isinstance(event, MsgEventSendFile):
        # event.path = "output/report.pdf"
        abs_path = Path(agent_workspace_path) / event.path

await agent.close()     # Triggers on_disconnect

Matrix file upload (verified from bot.py)

# Source: adapter/matrix/bot.py send_outgoing()

with file_path.open("rb") as handle:
    upload_response, _ = await client.upload(
        handle,
        content_type=attachment.mime_type or "application/octet-stream",
        filename=attachment.filename or file_path.name,
        filesize=file_path.stat().st_size,
    )
content_uri = upload_response.content_uri
await client.room_send(room_id, "m.room.message", {
    "msgtype": "m.file",     # or m.image, m.audio, m.video
    "body": filename,
    "url": content_uri,
})

YAML config extension (target format)

# config/matrix-agents.yaml (new format per D-02/D-03)

user_agents:
  "@user0:matrix.lambda.coredump.ru": agent-0
  "@user1:matrix.lambda.coredump.ru": agent-1

agents:
  - id: agent-0
    label: "Agent 0"
    base_url: "ws://lambda.coredump.ru:7000/agent_0/"
    workspace_path: "/agents/0/"

  - id: agent-1
    label: "Agent 1"
    base_url: "ws://lambda.coredump.ru:7000/agent_1/"
    workspace_path: "/agents/1/"

Runtime State Inventory

Phase includes refactoring but NOT renaming of string identifiers in user-facing data. Users interacting with the old multi-room bot will have SQLite room_meta records with old schema keys.

Category Items Found Action Required
Stored data (SQLite) lambda_matrix.db (dev). Room meta records contain chat_id: "C1", space_id, redirect_room_id, agent_id — from old multi-room flow. No migration. D-05 says: ignore existing Space+rooms, do not migrate. New users get DM-first. Old users' DM rooms will lack welcomed key — first message in DM room triggers normal message dispatch path (acceptable).
Stored data (SQLite) selected_agent_id key in user metadata — written by !agent command being deleted. No migration needed. !agent is gone. The new routing uses user_agents from YAML config. Old selected_agent_id values are orphaned but harmless.
Live service config No external services with stored config (no n8n, no Datadog). None.
OS-registered state None. Bot runs in Docker, no launchd/systemd registration. None.
Secrets/env vars AGENT_BASE_URL (global) → replaced by per-agent base_url in YAML. SURFACES_WORKSPACE_DIR (global workspace) → per-agent workspace_path from YAML. Both env vars become deprecated for prod but remain for backward compat in dev. Update .env.example. Add .env.prod template.
Build artifacts None in prod context. Local: .venv, __pycache__ — unaffected. None.

Validation Architecture

Test Framework

Property Value
Framework pytest 9.0.2 + pytest-asyncio
Config file pyproject.toml (asyncio_mode = "auto")
Quick run command uv run pytest tests/adapter/matrix/ -q
Full suite command uv run pytest tests/ -q

Current Test Status (pre-Phase-05)

File Status Disposition in Phase 05
test_converter.py 14 passing Keep as-is
test_files.py 2 passing Update for new path format
test_reactions.py 2 passing Keep as-is
test_restart_persistence.py 5 passing Keep; update if routing logic changes
test_routing_enforcement.py 5 passing Update for user_agents routing model
test_store.py 2 passing Keep as-is
test_agent_handler.py failing (import?) DELETE — !agent is deleted
test_agent_registry.py failing (import?) REWRITE — test new AgentDefinition schema
test_chat_space.py failing DELETE — Space provisioning deleted
test_confirm.py failing Keep or update
test_context_commands.py 4 failing REWRITE — !save/!load deleted; keep !context, add !clear
test_dispatcher.py 20 failing REWRITE — DM-first flow replaces multi-room
test_invite_space.py 3 failing DELETE and REPLACE with DM-first invite tests
test_routed_platform.py 1 failing REWRITE — user_agents lookup replaces room binding
test_send_outgoing.py failing REWRITE — per-agent workspace_path

Phase Requirements → Test Map

Behavior Test Type Automated Command Wave
AgentRegistry parses new YAML format (user_agents + base_url/workspace_path) unit uv run pytest tests/adapter/matrix/test_agent_registry.py -x Wave 1
Unauthorized user gets access-denied message on invite unit uv run pytest tests/adapter/matrix/test_invite_dm.py -x Wave 2
Authorized user gets welcome on DM invite unit uv run pytest tests/adapter/matrix/test_invite_dm.py -x Wave 2
Message from authorized user routes to correct delegate unit uv run pytest tests/adapter/matrix/test_routed_platform.py -x Wave 2
Incoming file saved to incoming/{filename} under agent workspace unit uv run pytest tests/adapter/matrix/test_files.py -x Wave 3
!clear command returns "Контекст сброшен." unit uv run pytest tests/adapter/matrix/test_context_commands.py -x Wave 2
Full suite green integration uv run pytest tests/ -q Phase gate

Wave 0 Gaps

  • tests/adapter/matrix/test_invite_dm.py — DM-first invite flow (new file)
  • Updated tests/adapter/matrix/test_agent_registry.py — new schema

(All other existing test infrastructure is in place. No new framework install needed.)


Environment Availability

Dependency Required By Available Version Fallback
uv / Python 3.11 tests, bot run Python 3.11.9, pytest 9.0.2
Docker docker-compose.prod.yml ✓ (assumed dev machine) Manual install
matrix-nio Matrix adapter installed in .venv
pyyaml agent_registry.py installed (yaml import works in bot context)
lambda-agent:latest image docker-compose.prod.yml placeholder — platform team owns Use build: ./external/platform-agent for local testing

Missing dependencies with no fallback:

  • lambda-agent:latest — docker-compose.prod.yml uses this as placeholder image. For actual testing, use build: ./external/platform-agent fallback or image: busybox stub.

Open Questions

  1. is_direct detection for group room rejection (D-05, Claude's Discretion)

    • What we know: matrix-nio's InviteMemberEvent does not expose is_direct flag directly. The MatrixRoom type has member count accessible via room.member_count or room.joined_members.
    • What's unclear: Whether InviteMemberEvent or MatrixRoom in nio exposes enough to reliably detect DM vs. group at invite time.
    • Recommendation: At Phase 05, accept all invites and immediately check user_agents authorization. Non-DM group rooms where the bot is invited by an authorized user will also work (no harm). Add room.member_count <= 2 check if desired.
  2. True !clear semantics with MemorySaver

    • What we know: RealPlatformClient._build_chat_api creates a new AgentApi(chat_id="0") per request. The agent's MemorySaver is keyed by chat_id — always "0". So context is NOT cleared by reconnecting.
    • What's unclear: Whether !clear should work "for real" (requires platform to support a reset endpoint or different chat_id) or just show a user-facing message (MVP-acceptable).
    • Recommendation: Phase 05 sends "Контекст сброшен." immediately (D-11). Document the limitation. Actual context reset is a platform concern.
  3. lambda-agent:latest image name

    • What we know: D-08 says "placeholder image lambda-agent:latest — уточнить у Азамата".
    • Recommendation: Use lambda-agent:latest as image name in docker-compose.prod.yml. Add a comment indicating it's a placeholder. Provide build: fallback pointing to ./external/platform-agent for local dev validation.

Assumptions Log

# Claim Section Risk if Wrong
A1 lambda-agent:latest is the agreed image name for the agent container docker-compose section docker-compose.prod.yml won't work; easy to fix by updating image name
A2 Group room invite detection is not required for Phase 05 (DM-first only means "start in DM", not "reject group invites") DM-first onboarding If group room rejection IS required, need to investigate matrix-nio InviteMemberEvent structure
A3 !clear in Phase 05 is cosmetic (shows "cleared" but MemorySaver persists until agent restart) !clear section User confusion if they expect real context reset

Project Constraints (from CLAUDE.md)

Directive Implication for Phase 05
Вызовы платформы — через platform/interface.py (Protocol) RealPlatformClient stays the SDK boundary; AgentApi is internal to sdk/
При подключении реального SDK — меняем только platform/mock.py Phase 05 touches sdk/real.py for workspace_path — acceptable, it's a refinement not a rewrite
Хотфиксы (< 20 строк) → Claude Code напрямую, не Codex Phase 05 is >20 lines; must go through Codex via GSD
Реализацию делает codex:rescue Plans must be PLAN.md format passable to Codex
Никогда не коммить .env .env.prod must be in .gitignore — only .env.prod.example is committed
uv sync для зависимостей No new pip installs; all deps already in pyproject.toml
pytest tests/ для тестов Phase gate: uv run pytest tests/ -q must be green

Sources

Primary (HIGH confidence)

  • [VERIFIED: adapter/matrix/agent_registry.py] — current AgentDefinition/AgentRegistry structure
  • [VERIFIED: adapter/matrix/bot.py] — _build_platform_from_env, MatrixBot, handle_invite, _materialize_incoming_attachments
  • [VERIFIED: adapter/matrix/routed_platform.py] — _resolve_delegate logic
  • [VERIFIED: adapter/matrix/files.py] — build_workspace_attachment_path, download_matrix_attachment
  • [VERIFIED: adapter/matrix/handlers/agent.py] — !agent handler (to be deleted)
  • [VERIFIED: adapter/matrix/handlers/auth.py] — provision_workspace_chat (to be replaced)
  • [VERIFIED: adapter/matrix/handlers/context_commands.py] — !save/!load/!reset handlers
  • [VERIFIED: adapter/matrix/handlers/init.py] — handler registration
  • [VERIFIED: sdk/real.py] — RealPlatformClient, _build_chat_api, _attachment_from_send_file_event
  • [VERIFIED: sdk/upstream_agent_api.py] — sys.path patching, AgentApi import
  • [VERIFIED: external/platform-agent_api/lambda_agent_api/agent_api.py] — actual AgentApi implementation
  • [VERIFIED: config/matrix-agents.yaml] — current format
  • [VERIFIED: docker-compose.yml] — existing dev compose topology
  • [VERIFIED: .env.example] — current env var set
  • [VERIFIED: docs/deploy-architecture.md] — prod topology spec
  • [VERIFIED: .planning/phases/05-mvp-deployment/05-CONTEXT.md] — locked decisions

Secondary (MEDIUM confidence)

  • [ASSUMED: A1] lambda-agent image name — from CONTEXT.md D-08 description
  • [ASSUMED: A2] Group room handling scope — inferred from D-05 wording

Metadata

Confidence breakdown:

  • Standard stack: HIGH — all libraries verified in existing code
  • Architecture patterns: HIGH — all patterns verified against actual source files
  • Pitfalls: HIGH — all pitfalls derived from reading actual code, not from training assumptions
  • Test strategy: HIGH — test files enumerated and statuses verified by running pytest

Research date: 2026-04-27 Valid until: 2026-05-27 (stable codebase; short-circuit if platform-agent_api changes)