surfaces/.planning/phases/05-mvp-deployment/05-01-PLAN.md

158 lines
9.3 KiB
Markdown

---
phase: 05-mvp-deployment
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- adapter/matrix/reconciliation.py
- adapter/matrix/bot.py
- tests/adapter/matrix/test_reconciliation.py
- tests/adapter/matrix/test_restart_persistence.py
autonomous: true
requirements:
- PH05-01
- PH05-03
must_haves:
truths:
- "On restart, existing Matrix Space and child-room topology is rebuilt before live sync begins."
- "Restart recovery preserves Space+rooms UX instead of creating duplicate DM-style working rooms."
- "Recovered rooms regain user metadata, room metadata, and chat bindings needed for normal routing."
- "Legacy working rooms missing `platform_chat_id` are backfilled deterministically during startup before strict routing handles traffic."
artifacts:
- path: "adapter/matrix/reconciliation.py"
provides: "Authoritative restart reconciliation from Matrix topology into local metadata"
- path: "adapter/matrix/bot.py"
provides: "Startup wiring that runs reconciliation before sync_forever"
- path: "tests/adapter/matrix/test_reconciliation.py"
provides: "Regression coverage for startup recovery and idempotence"
key_links:
- from: "adapter/matrix/bot.py"
to: "adapter/matrix/reconciliation.py"
via: "startup bootstrap before sync_forever"
pattern: "reconcil"
- from: "adapter/matrix/reconciliation.py"
to: "core/chat.py"
via: "chat manager rebuild for recovered rooms"
pattern: "get_or_create"
---
<objective>
Rebuild Matrix-local routing state from authoritative Space topology before the bot processes live traffic.
Purpose: Preserve the Phase 01 Space+rooms contract after restart even if SQLite metadata is partial or missing.
Output: A startup reconciliation module, bot wiring, and regression tests proving no DM-first duplication on restart.
</objective>
<execution_context>
@/Users/a/.codex/get-shit-done/workflows/execute-plan.md
@/Users/a/.codex/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/05-mvp-deployment/05-RESEARCH.md
@.planning/phases/05-mvp-deployment/05-VALIDATION.md
@.planning/phases/04-matrix-mvp-shared-agent-context-and-context-management-comma/04-02-SUMMARY.md
@adapter/matrix/bot.py
@adapter/matrix/store.py
@adapter/matrix/handlers/auth.py
@tests/adapter/matrix/test_invite_space.py
@tests/adapter/matrix/test_chat_space.py
@tests/adapter/matrix/test_restart_persistence.py
<interfaces>
From `adapter/matrix/bot.py`:
```python
async def prepare_live_sync(client: AsyncClient) -> str | None:
response = await client.sync(timeout=0, full_state=True)
if isinstance(response, SyncResponse):
return response.next_batch
return None
```
```python
class MatrixBot:
async def _bootstrap_unregistered_room(
self,
room: MatrixRoom,
sender: str,
) -> list[OutgoingEvent] | None: ...
```
From `adapter/matrix/store.py`:
```python
async def get_room_meta(store: StateStore, room_id: str) -> dict | None: ...
async def set_room_meta(store: StateStore, room_id: str, meta: dict) -> None: ...
async def get_user_meta(store: StateStore, matrix_user_id: str) -> dict | None: ...
async def set_user_meta(store: StateStore, matrix_user_id: str, meta: dict) -> None: ...
async def next_platform_chat_id(store: StateStore) -> str: ...
```
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Add restart reconciliation regression coverage</name>
<files>tests/adapter/matrix/test_reconciliation.py, tests/adapter/matrix/test_restart_persistence.py</files>
<read_first>tests/adapter/matrix/test_invite_space.py, tests/adapter/matrix/test_chat_space.py, tests/adapter/matrix/test_restart_persistence.py, adapter/matrix/bot.py, adapter/matrix/handlers/auth.py, .planning/phases/05-mvp-deployment/05-RESEARCH.md</read_first>
<behavior>
- Test 1: startup recovery rebuilds user space metadata, room metadata, and chat bindings from Matrix topology without creating new working rooms (per D-Phase05-reset and PH05-01).
- Test 2: reconciliation is idempotent and safe when local SQLite state is already present.
- Test 3: reconciliation happens before lazy `_bootstrap_unregistered_room()` would run for existing rooms (per PH05-03).
- Test 4: legacy room metadata missing `platform_chat_id` is backfilled deterministically at startup and persisted before routed handling begins.
</behavior>
<acceptance_criteria>
- `tests/adapter/matrix/test_reconciliation.py` exists and names reconciliation entrypoints explicitly.
- The new tests assert restored `space_id`, `chat_id`, `matrix_user_id`, and `platform_chat_id` values for recovered rooms.
- The regression slice also proves existing Space onboarding behavior still passes by running `test_invite_space.py` and `test_chat_space.py`.
- The automated command in `<verify>` fails before implementation or would fail if reconciliation is removed.
</acceptance_criteria>
<action>Create a dedicated `tests/adapter/matrix/test_reconciliation.py` module and extend restart persistence coverage so Phase 05 has a real Wave 0 contract. Model the recovered topology after the Phase 01 Space+rooms onboarding tests, not a DM-first flow, and explicitly keep those onboarding regressions in the verification slice so restart hardening cannot break provisioning UX. Cover recovery of `user_meta`, `room_meta`, `ChatManager` bindings, and room-local routing fields from Matrix-side state before live callbacks begin, including deterministic backfill for legacy rooms that predate `platform_chat_id`. Keep temporary UX state out of scope, per research.</action>
<verify>
<automated>pytest tests/adapter/matrix/test_invite_space.py tests/adapter/matrix/test_chat_space.py tests/adapter/matrix/test_reconciliation.py tests/adapter/matrix/test_restart_persistence.py -v</automated>
</verify>
<done>Phase 05 has failing-or-red-before-code tests that define authoritative restart reconciliation behavior and exclude duplicate room provisioning.</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Implement authoritative startup reconciliation and wire it before live sync</name>
<files>adapter/matrix/reconciliation.py, adapter/matrix/bot.py</files>
<read_first>adapter/matrix/bot.py, adapter/matrix/store.py, adapter/matrix/handlers/auth.py, tests/adapter/matrix/test_reconciliation.py, tests/adapter/matrix/test_restart_persistence.py, .planning/phases/05-mvp-deployment/05-RESEARCH.md</read_first>
<behavior>
- Test 1: startup rebuild runs after login and initial full-state fetch, but before `sync_forever()` processes live events.
- Test 2: recovered rooms keep their existing Space+rooms identity and do not trigger `_bootstrap_unregistered_room()` unless the room is genuinely new.
- Test 3: local metadata can be rebuilt from Matrix topology when SQLite entries are missing, while existing valid metadata remains stable.
- Test 4: startup repair assigns a deterministic `platform_chat_id` to legacy rooms missing that field and persists it before routed platform calls can occur.
</behavior>
<acceptance_criteria>
- `adapter/matrix/reconciliation.py` exports a focused reconciliation entrypoint used by startup code.
- `adapter/matrix/bot.py` invokes reconciliation before `client.sync_forever(...)`.
- Recovered room metadata includes `room_type`, `chat_id`, `space_id`, `matrix_user_id`, and `platform_chat_id` where available or rebuildable.
- Legacy rooms missing `platform_chat_id` follow one documented startup backfill path rather than ad hoc routing fallbacks.
</acceptance_criteria>
<action>Implement a restart recovery module that treats Matrix topology as authoritative, per the Phase 05 reset and research notes. Rebuild missing local metadata for Space-owned working rooms, deterministically backfill missing `platform_chat_id` values for legacy rooms, and re-create `ChatManager` entries needed by routing, while keeping SQLite as a rebuildable cache rather than the source of truth. Wire the new reconciliation step into startup after the initial full-state sync and before live sync begins, and keep the onboarding regression slice green while doing it. Do not widen into timeline scraping, new storage backends, or DM-first fallbacks.</action>
<verify>
<automated>pytest tests/adapter/matrix/test_invite_space.py tests/adapter/matrix/test_chat_space.py tests/adapter/matrix/test_reconciliation.py tests/adapter/matrix/test_restart_persistence.py tests/adapter/matrix/test_dispatcher.py -v</automated>
</verify>
<done>Restart recovery restores the minimum durable state for existing Space rooms before live traffic, and the guarded regression suite passes.</done>
</task>
</tasks>
<verification>
Run the onboarding, reconciliation, restart-persistence, and Matrix dispatcher slices together. Confirm startup now has a deterministic pre-sync recovery and legacy-room backfill step instead of relying on lazy room bootstrap or routing-time fallbacks for existing topology.
</verification>
<success_criteria>
The bot can restart with partial or empty local room metadata, rebuild managed Space rooms before live sync, and continue handling those rooms without creating duplicate onboarding rooms.
</success_criteria>
<output>
After completion, create `.planning/phases/05-mvp-deployment/05-01-SUMMARY.md`
</output>