surfaces/.planning/phases/01.1-matrix-restart-reconciliation-and-dev-reset-workflow/01.1-02-PLAN.md

8.6 KiB

phase plan type wave depends_on files_modified autonomous requirements must_haves
01.1-matrix-restart-reconciliation-and-dev-reset-workflow 02 execute 2
01.1-01
adapter/matrix/bot.py
tests/adapter/matrix/test_dispatcher.py
true
truths artifacts key_links
The Matrix bot performs an initial sync and reconciliation before entering steady-state `sync_forever()`.
If a room still arrives as `unregistered:{room_id}` after startup, the bot makes one targeted recovery attempt before dispatching or failing.
When reconciliation cannot repair a room, the bot logs a clear diagnostic reason instead of crashing on downstream commands like `!rename`.
path provides
adapter/matrix/bot.py Startup bootstrap flow with initial sync, reconciliation, and targeted runtime retry.
path provides
tests/adapter/matrix/test_dispatcher.py Matrix runtime coverage for pre-sync reconcile and on-message recovery behavior.
from to via pattern
adapter/matrix/bot.py adapter/matrix/reconcile.py startup bootstrap and single-room recovery calls reconcile_(matrix_state|single_room)
from to via pattern
adapter/matrix/bot.py adapter/matrix/room_router.py unregistered room detection before dispatch unregistered:
Wire the new reconciliation layer into the actual Matrix runtime.

Purpose: D-05 through D-07 require restart recovery to be the default developer path. The bot must bootstrap itself from existing Matrix rooms on startup and make one on-demand repair attempt before routing an unknown room through the dispatcher. Output: adapter/matrix/bot.py performs initial sync + reconciliation before sync_forever(), and runtime tests prove the bot recovers or logs clearly instead of blindly dispatching broken state.

<execution_context> @/Users/a/.codex/get-shit-done/workflows/execute-plan.md @/Users/a/.codex/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/01.1-matrix-restart-reconciliation-and-dev-reset-workflow/01.1-CONTEXT.md @.planning/phases/01.1-matrix-restart-reconciliation-and-dev-reset-workflow/01.1-RESEARCH.md @.planning/phases/01.1-matrix-restart-reconciliation-and-dev-reset-workflow/01.1-01-PLAN.md @adapter/matrix/bot.py @adapter/matrix/room_router.py @adapter/matrix/reconcile.py @tests/adapter/matrix/test_dispatcher.py From `adapter/matrix/bot.py`:
class MatrixBot:
    async def on_room_message(self, room: MatrixRoom, event: RoomMessageText) -> None

async def main() -> None

From adapter/matrix/reconcile.py:

async def reconcile_matrix_state(client: Any, store: StateStore, chat_mgr: ChatManager) -> dict
async def reconcile_single_room(
    client: Any, store: StateStore, chat_mgr: ChatManager, room_id: str, matrix_user_id: str
) -> dict

From adapter/matrix/room_router.py:

async def resolve_chat_id(store: StateStore, room_id: str, matrix_user_id: str) -> str
Task 1: Run initial sync and reconciliation before the long-poll loop adapter/matrix/bot.py, tests/adapter/matrix/test_dispatcher.py adapter/matrix/bot.py, adapter/matrix/reconcile.py, tests/adapter/matrix/test_dispatcher.py, .planning/phases/01.1-matrix-restart-reconciliation-and-dev-reset-workflow/01.1-RESEARCH.md - Test 1: `main()` performs `client.sync(timeout=0, full_state=True)` before `sync_forever()`. - Test 2: `main()` calls `reconcile_matrix_state(...)` after the initial sync and logs the returned report. - Test 3: startup still reaches `sync_forever()` when reconciliation reports recoverable skips/conflicts instead of fatal failure. Modify `adapter/matrix/bot.py` so normal startup follows the two-phase bootstrap recommended in research: 1. build client and runtime 2. authenticate 3. register callbacks 4. run `await client.sync(timeout=0, full_state=True)` 5. run `await reconcile_matrix_state(client, runtime.store, runtime.chat_mgr)` 6. log a structured `matrix_reconcile_complete` event with the report fields 7. enter `await client.sync_forever(timeout=30000)`

Do not move provisioning logic into startup. The startup step only rehydrates local state from server-side rooms per D-02 through D-04.

Update or add focused tests in tests/adapter/matrix/test_dispatcher.py using monkeypatch/fake-client patterns already used in the repo so the verify command proves the call order and logging-safe behavior. The test should fail if sync_forever() starts before reconciliation. cd /Users/a/MAI/sem2/lambda/surfaces-bot && pytest tests/adapter/matrix/test_dispatcher.py -q <acceptance_criteria>

  • adapter/matrix/bot.py runs an initial full-state sync before steady-state polling.
  • adapter/matrix/bot.py invokes reconcile_matrix_state(...) exactly once during startup.
  • Startup logs a structured reconciliation summary instead of silently skipping the recovery step.
  • tests/adapter/matrix/test_dispatcher.py asserts the bootstrap order explicitly. </acceptance_criteria> Normal Matrix bot startup now includes a recovery pass before the event loop begins handling user traffic.
Task 2: Retry unknown-room routing once before dispatching broken state adapter/matrix/bot.py, tests/adapter/matrix/test_dispatcher.py adapter/matrix/bot.py, adapter/matrix/room_router.py, adapter/matrix/reconcile.py, tests/adapter/matrix/test_dispatcher.py, .planning/phases/01.1-matrix-restart-reconciliation-and-dev-reset-workflow/01.1-CONTEXT.md - Test 1: `MatrixBot.on_room_message(...)` detects `unregistered:{room_id}`, runs `reconcile_single_room(...)`, then retries `resolve_chat_id(...)`. - Test 2: if retry succeeds, the event is dispatched against the recovered logical chat id. - Test 3: if retry still fails, the bot does not crash; it logs a clear warning and sends a user-facing diagnostic message to that room. Extend `MatrixBot.on_room_message(...)` so D-07 is satisfied even when startup could not repair a room yet. Keep `resolve_chat_id(...)` as the room-router source of truth, but treat `unregistered:{room_id}` as a recovery trigger rather than a stable runtime identity: - first call `resolve_chat_id(...)` - if the result starts with `unregistered:`, call `reconcile_single_room(client, runtime.store, runtime.chat_mgr, room.room_id, event.sender)` - immediately retry `resolve_chat_id(...)` - only dispatch once a concrete logical chat id exists - if the retry still returns `unregistered:{room_id}`, log a structured warning with room id, matrix user id, and reconciliation report, then send a short `OutgoingMessage`-equivalent Matrix text explaining that local state could not be restored automatically and a dev reset/restart may be required

Do not invent a new fallback chat id and do not auto-create rooms here; that would violate D-04. Keep this change inside adapter/matrix/bot.py so file ownership stays isolated for this plan. cd /Users/a/MAI/sem2/lambda/surfaces-bot && pytest tests/adapter/matrix/test_dispatcher.py -q <acceptance_criteria>

  • Unknown Matrix rooms trigger one targeted reconciliation attempt before dispatch.
  • Successful targeted recovery leads to normal dispatch with a real logical chat_id.
  • Failed targeted recovery logs a clear diagnostic and avoids a handler crash on missing chat state per D-06.
  • No code path in this task provisions new Matrix rooms or Spaces. </acceptance_criteria> The runtime treats unknown rooms as recoverable state drift first, not as a silent routing failure or crash path.
Run `pytest tests/adapter/matrix/test_dispatcher.py -q` and confirm both startup-bootstrap and first-access recovery behaviors are covered.

<success_criteria>

  • A standard Matrix restart now attempts recovery before the bot starts processing live events.
  • Unknown-room events are diagnosable and recoverable instead of falling straight into broken command handling.
  • The runtime never provisions new server-side rooms during restart reconciliation. </success_criteria>
After completion, create `.planning/phases/01.1-matrix-restart-reconciliation-and-dev-reset-workflow/01.1-02-SUMMARY.md`