wip: 05-mvp-deployment paused at task 0/0

2026-04-30 18:04:24 +03:00 · 2026-04-30 18:04:24 +03:00 · 6369721876
commit 6369721876
parent 7e5f9c20a0
2 changed files with 93 additions and 42 deletions
--- a/.planning/phases/05-mvp-deployment/.continue-here.md
+++ b/.planning/phases/05-mvp-deployment/.continue-here.md
@ -3,37 +3,60 @@ phase: 05-mvp-deployment
 phase_name: MVP deployment
 task: 0
 total_tasks: 0
-status: completed
-last_updated: 2026-04-28T21:07:17Z
+status: paused
+last_updated: 2026-04-30T15:03:14Z
 ---

 <current_state>
-Phase 05 deployment handoff is complete. Image rebuilt for linux/amd64 and handoff text prepared for platform team.
+Phase 05 code changes are in place, but the latest workspace-root attachment contract is not yet published in a new production image. Today's last debugging step confirmed that the user-to-agent config itself was fine except for one exact-MXID mismatch: the homeserver suffix in `user_agents` did not match the real Matrix sender, so fallback to the first agent was expected.
 </current_state>

 <completed_work>

- Rebuilt image for linux/amd64 (was arm64 only): `mput1/surfaces-bot:latest`
- Updated deploy handoff digest in .continue-here.md
- Prepared deployment checklist text for platform
+- Fixed the path-based `base_url` normalization bug that caused WS connects to drop route prefixes.
+- Added WS lifecycle debug logging behind `SURFACES_DEBUG_WS=1`.
+- Added Matrix routing/recovery behavior:
+- warning users when they are not listed in `user_agents`
+- preserving room bindings across config updates
+- re-inviting users back into their Space and active rooms after leave
+- `!new` from the entry/DM room to create a fresh working chat
+- Reworked attachment handling so user files now go directly into the agent workspace root with Windows-style collision suffixes like `file (1).pdf`.
+- Updated docs and tests to match the new root-workspace file contract.
+- Verified that the recent “still goes to default agent” report was caused by exact MXID mismatch in config, not by YAML parsing or runtime routing logic.
+- Published earlier images:
+- `mput1/surfaces-bot:debug-ws-20260429`
+- `mput1/surfaces-bot:matrix-recovery-20260429`
 </completed_work>

 <remaining_work>

- Platform needs to pull image and deploy
- Awaiting smoke test confirmation from platform side
+- Build and publish a new production image that includes the latest workspace-root attachment changes.
+- Give the platform the new digest and ask them to redeploy the Matrix bot container.
+- Optionally run local smoke/fullstack validation once more before publishing if extra confidence is needed.
 </remaining_work>

 <decisions_made>

- Rebuild for amd64 to match platform's production environment
+- Keep the fallback to the first agent when a user is missing from `user_agents`.
+- Require exact Matrix MXID match in `user_agents`; no fuzzy matching or homeserver normalization was added.
+- Warn the user in-band when default-agent fallback is used.
+- Keep room identity and `platform_chat_id` stable across config updates.
+- Require container restart for config changes; no image rebuild is needed for `matrix-agents.yaml` edits alone.
+- Remove `incoming/` and timestamp prefixes from the attachment contract.
+- Save uploaded user files directly at the workspace root and resolve collisions with copy-style suffixes.
 </decisions_made>

 <blockers>

- None — implementation complete, awaiting platform deployment
+- No code blocker.
+- External dependency: platform redeploy after the next image publish.
+- Historical debt: placeholder summary/plan artifacts still exist in old Phase 04 files and were not cleaned during this session.
 </blockers>

+<context>
+The current codebase should route correctly if the deployed config uses the exact real Matrix sender IDs, e.g. `@user:matrix.lambda.coredump.ru`. The next likely mistake during resume would be publishing the wrong image digest: the currently published recovery image predates the latest file-contract change. Resume by building a fresh image from the current worktree, not by reusing the old digest.
+</context>
+
 <next_action>
-Await platform deployment confirmation. No further implementation work needed until platform reports issues or requests changes.
-</next_action>
+Rebuild the production image from the current worktree, publish it, and send the new digest to the platform for redeploy.
+</next_action>