Add background process management with process tool, wait, PTY, and stdin support

New process registry and tool for managing long-running background processes across all terminal backends (local, Docker, Singularity, Modal, SSH). Process Registry (tools/process_registry.py): - ProcessSession tracking with rolling 200KB output buffer - spawn_local() with optional PTY via ptyprocess for interactive CLIs - spawn_via_env() for non-local backends (runs inside sandbox, never on host) - Background reader threads per process (Popen stdout or PTY) - wait() with timeout clamping, interrupt support, and transparent limit reporting - JSON checkpoint to ~/.hermes/processes.json for gateway crash recovery - Module-level singleton shared across agent loop, gateway, and RL Process Tool (model_tools.py): - 7 actions: list, poll, log, wait, kill, write, submit - Paired with terminal in all toolsets (CLI, messaging, RL) - Timeout clamping with transparent notes in response Terminal Tool Updates (tools/terminal_tool.py): - Replaced nohup background mode with registry spawn (returns session_id) - Added workdir parameter for per-command working directory - Added check_interval parameter for gateway auto-check watchers - Added pty parameter for interactive CLI tools (Codex, Claude Code) - Updated TERMINAL_TOOL_DESCRIPTION with full background workflow docs - Cleanup thread now respects active background processes (won't reap sandbox) Gateway Integration (gateway/run.py, session.py, config.py): - Session reset protection: sessions with active processes exempt from reset - Default idle timeout increased from 2 hours to 24 hours - from_dict fallback aligned to match (was 120, now 1440) - session_key env var propagated to process registry for session mapping - Crash recovery on gateway startup via checkpoint probe - check_interval watcher: asyncio task polls process, delivers updates to platform RL Safety (environments/): - tool_context.py cleanup() kills background processes on episode end - hermes_base_env.py warns when enabled_toolsets is None (loads all tools) - Process tool safe in RL via wait() blocking the agent loop Also: - Added ptyprocess as optional dependency (in pyproject.toml [pty] extra + [all]) - Fixed pre-existing bug: rl_test_inference missing from TOOL_TO_TOOLSET_MAP - Updated AGENTS.md with process management docs and project structure - Updated README.md terminal section with process management overview
2026-02-17 02:51:31 -08:00 · 2026-02-17 02:51:31 -08:00 · 061fa70907
commit 061fa70907
parent 48b5cfd085
12 changed files with 1142 additions and 40 deletions
--- a/tools/terminal_tool.py
+++ b/tools/terminal_tool.py
@ -1122,22 +1122,33 @@ TERMINAL_TOOL_DESCRIPTION = """Execute commands on a secure Linux environment.

 **Command Execution:**
 - Simple commands: Just provide the 'command' parameter
- Background processes: Set 'background': True for servers/long-running tasks
+- Background processes: Set 'background': true to get a session_id for monitoring via the 'process' tool
 - Command timeout: Optional 'timeout' parameter in seconds
+- Working directory: Optional 'workdir' parameter for per-command cwd
+- PTY mode: Set 'pty': true for interactive CLI tools (Codex, Claude Code, etc.)

 **Examples:**
 - Run command: `{"command": "ls -la"}`
- Background task: `{"command": "source venv/bin/activate && python server.py", "background": True}`
+- Background task: `{"command": "pytest -v tests/", "background": true}` -- returns session_id, use process tool to poll/wait/kill
+- With workdir: `{"command": "npm install", "workdir": "/home/user/project"}`
 - With timeout: `{"command": "long_task.sh", "timeout": 300}`
+- Interactive CLI: `{"command": "codex exec 'Add tests'", "background": true, "pty": true}`
+
+**Background Process Workflow:**
+1. Start: `terminal(command="...", background=true)` -- returns session_id
+2. Monitor: `process(action="poll", session_id="...")` -- check status + new output
+3. Wait: `process(action="wait", session_id="...", timeout=600)` -- block until done
+4. Interact: `process(action="write/submit", session_id="...", data="y")` -- send stdin
+5. Kill: `process(action="kill", session_id="...")` -- terminate

 **Best Practices:**
- Run servers/long processes in background
- Monitor disk usage for large tasks
+- Use background mode for long-running tasks, then process(wait) to block until completion
+- Use workdir to run commands in specific project directories
 - Install whatever tools you need with apt-get or pip
- Try to create or use a venv with uv or python -m venv to keep isolation from global system packages.
+- Try to create or use a venv with uv or python -m venv to keep isolation from global system packages

 **Things to avoid:**
- Do NOT use interactive tools such as tmux, vim, nano, python repl - you will get stuck.
+- Do NOT use interactive tools (vim, nano, python repl) without pty=true -- they will hang without a pseudo-terminal.
 - Even git sometimes becomes interactive if the output is large. If you're not sure, pipe to cat.
 """

@ -1295,6 +1306,16 @@ def _cleanup_inactive_envs(lifetime_seconds: int = 300):

    current_time = time.time()

+    # Check the process registry -- skip cleanup for sandboxes with active
+    # background processes (their _last_activity gets refreshed to keep them alive).
+    try:
+        from tools.process_registry import process_registry
+        for task_id in list(_last_activity.keys()):
+            if process_registry.has_active_processes(task_id):
+                _last_activity[task_id] = current_time  # Keep sandbox alive
+    except ImportError:
+        pass
+
    # Phase 1: collect stale entries and remove them from tracking dicts while
    # holding the lock.  Do NOT call env.cleanup() inside the lock -- Modal and
    # Docker teardown can block for 10-15s, which would stall every concurrent
@ -1501,7 +1522,10 @@ def terminal_tool(
    background: bool = False,
    timeout: Optional[int] = None,
    task_id: Optional[str] = None,
-    force: bool = False
+    force: bool = False,
+    workdir: Optional[str] = None,
+    check_interval: Optional[int] = None,
+    pty: bool = False,
 ) -> str:
    """
    Execute a command using mini-swe-agent's execution environments.
@ -1512,6 +1536,9 @@ def terminal_tool(
        timeout: Command timeout in seconds (default: from config)
        task_id: Unique identifier for environment isolation (optional)
        force: If True, skip dangerous command check (use after user confirms)
+        workdir: Working directory for this command (optional, uses session cwd if not set)
+        check_interval: Seconds between auto-checks for background processes (gateway only, min 30)
+        pty: If True, use pseudo-terminal for interactive CLI tools (local backend only)

    Returns:
        str: JSON string with output, exit_code, and error fields
@ -1662,20 +1689,69 @@ def terminal_tool(

        # Prepare command for execution
        if background:
-            # Run in background with nohup and redirect output
-            exec_command = f"nohup {command} > /tmp/bg_output.log 2>&1 &"
+            # Spawn a tracked background process via the process registry.
+            # For local backends: uses subprocess.Popen with output buffering.
+            # For non-local backends: runs inside the sandbox via env.execute().
+            from tools.process_registry import process_registry
+
+            session_key = os.getenv("HERMES_SESSION_KEY", "")
+            effective_cwd = workdir or cwd
            try:
-                result = env.execute(exec_command, timeout=10)
-                return json.dumps({
-                    "output": "Background task started successfully",
+                if env_type == "local":
+                    proc_session = process_registry.spawn_local(
+                        command=command,
+                        cwd=effective_cwd,
+                        task_id=effective_task_id,
+                        session_key=session_key,
+                        env_vars=env.env if hasattr(env, 'env') else None,
+                        use_pty=pty,
+                    )
+                else:
+                    proc_session = process_registry.spawn_via_env(
+                        env=env,
+                        command=command,
+                        cwd=effective_cwd,
+                        task_id=effective_task_id,
+                        session_key=session_key,
+                    )
+
+                result_data = {
+                    "output": "Background process started",
+                    "session_id": proc_session.id,
+                    "pid": proc_session.pid,
                    "exit_code": 0,
-                    "error": None
-                }, ensure_ascii=False)
+                    "error": None,
+                }
+
+                # Transparent timeout clamping note
+                max_timeout = effective_timeout
+                if timeout and timeout > max_timeout:
+                    result_data["timeout_note"] = (
+                        f"Requested timeout {timeout}s was clamped to "
+                        f"configured limit of {max_timeout}s"
+                    )
+
+                # Register check_interval watcher (gateway picks this up after agent run)
+                if check_interval and background:
+                    effective_interval = max(30, check_interval)
+                    if check_interval < 30:
+                        result_data["check_interval_note"] = (
+                            f"Requested {check_interval}s raised to minimum 30s"
+                        )
+                    process_registry.pending_watchers.append({
+                        "session_id": proc_session.id,
+                        "check_interval": effective_interval,
+                        "session_key": session_key,
+                        "platform": os.getenv("HERMES_SESSION_PLATFORM", ""),
+                        "chat_id": os.getenv("HERMES_SESSION_CHAT_ID", ""),
+                    })
+
+                return json.dumps(result_data, ensure_ascii=False)
            except Exception as e:
                return json.dumps({
                    "output": "",
                    "exit_code": -1,
-                    "error": f"Failed to start background task: {str(e)}"
+                    "error": f"Failed to start background process: {str(e)}"
                }, ensure_ascii=False)
        else:
            # Run foreground command with retry logic
@ -1685,7 +1761,10 @@ def terminal_tool(
            
            while retry_count <= max_retries:
                try:
-                    result = env.execute(command, timeout=effective_timeout)
+                    execute_kwargs = {"timeout": effective_timeout}
+                    if workdir:
+                        execute_kwargs["cwd"] = workdir
+                    result = env.execute(command, **execute_kwargs)
                except Exception as e:
                    error_str = str(e).lower()
                    if "timeout" in error_str: