Add todo tool for task management and enhance CLI features
- Introduced a new `todo_tool.py` for planning and tracking multi-step tasks, enhancing the agent's capabilities. - Updated CLI to include a floating autocomplete dropdown for commands and improved user instructions for better navigation. - Revised toolsets to incorporate the new `todo` tool and updated documentation to reflect changes in available tools and commands. - Enhanced user experience with new keybindings and clearer command descriptions in the CLI.
This commit is contained in:
parent
225ae32e7a
commit
9e85408c7b
7 changed files with 80 additions and 15 deletions
22
AGENTS.md
22
AGENTS.md
|
|
@ -25,6 +25,7 @@ hermes-agent/
|
||||||
│ ├── uninstall.py # Uninstaller
|
│ ├── uninstall.py # Uninstaller
|
||||||
│ └── cron.py # Cron job management
|
│ └── cron.py # Cron job management
|
||||||
├── tools/ # Tool implementations
|
├── tools/ # Tool implementations
|
||||||
|
│ ├── todo_tool.py # Planning & task management (in-memory TodoStore)
|
||||||
│ ├── process_registry.py # Background process management (spawn, poll, wait, kill)
|
│ ├── process_registry.py # Background process management (spawn, poll, wait, kill)
|
||||||
│ ├── transcription_tools.py # Speech-to-text (Whisper API)
|
│ ├── transcription_tools.py # Speech-to-text (Whisper API)
|
||||||
├── gateway/ # Messaging platform adapters
|
├── gateway/ # Messaging platform adapters
|
||||||
|
|
@ -151,13 +152,23 @@ For models that support chain-of-thought reasoning:
|
||||||
|
|
||||||
The interactive CLI uses:
|
The interactive CLI uses:
|
||||||
- **Rich** - For the welcome banner and styled panels
|
- **Rich** - For the welcome banner and styled panels
|
||||||
- **prompt_toolkit** - For fixed input area with history and `patch_stdout`
|
- **prompt_toolkit** - For fixed input area with history, `patch_stdout`, slash command autocomplete, and floating completion menus
|
||||||
- **KawaiiSpinner** (in run_agent.py) - Animated feedback during API calls and tool execution
|
- **KawaiiSpinner** (in run_agent.py) - Animated kawaii faces during API calls; clean `┊` activity feed for tool execution results
|
||||||
|
|
||||||
Key components:
|
Key components:
|
||||||
- `HermesCLI` class - Main CLI controller with commands and conversation loop
|
- `HermesCLI` class - Main CLI controller with commands and conversation loop
|
||||||
|
- `SlashCommandCompleter` - Autocomplete dropdown for `/commands` (type `/` to see all)
|
||||||
- `load_cli_config()` - Loads config, sets environment variables for terminal
|
- `load_cli_config()` - Loads config, sets environment variables for terminal
|
||||||
- `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary
|
- `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary
|
||||||
|
|
||||||
|
CLI UX notes:
|
||||||
|
- Thinking spinner (during LLM API call) shows animated kawaii face + verb (`(⌐■_■) deliberating...`)
|
||||||
|
- When LLM returns tool calls, the spinner clears silently (no "got it!" noise)
|
||||||
|
- Tool execution results appear as a clean activity feed: `┊ {emoji} {verb} {detail} {duration}`
|
||||||
|
- "got it!" only appears when the LLM returns a final text response (`⚕ ready`)
|
||||||
|
- The prompt shows `⚕ ❯` when the agent is working, `❯` when idle
|
||||||
|
- Pasting 5+ lines auto-saves to `~/.hermes/pastes/` and collapses to a reference
|
||||||
|
- Multi-line input via Alt+Enter or Ctrl+J
|
||||||
- `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc.
|
- `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc.
|
||||||
|
|
||||||
CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging.
|
CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging.
|
||||||
|
|
@ -472,7 +483,12 @@ Follow this strict order to maintain consistency:
|
||||||
- Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
|
- Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
|
||||||
- The tool will be auto-disabled if the key is missing
|
- The tool will be auto-disabled if the key is missing
|
||||||
|
|
||||||
6. Optionally add to `toolset_distributions.py` for batch processing
|
6. Add `"todo"` to the relevant platform toolsets (`hermes-cli`, `hermes-telegram`, etc.)
|
||||||
|
|
||||||
|
7. Optionally add to `toolset_distributions.py` for batch processing
|
||||||
|
|
||||||
|
**Special case: tools that need agent-level state** (like `todo`):
|
||||||
|
If your tool needs access to the AIAgent instance (e.g., in-memory state per session), intercept it directly in `run_agent.py`'s tool dispatch loop *before* `handle_function_call()`. Add a fallback error in `handle_function_call()` for safety. See `todo_tool.py` and the `if function_name == "todo":` block in `run_agent.py` for the pattern. For RL environments, add the same intercept in `environments/agent_loop.py`.
|
||||||
|
|
||||||
### Tool Implementation Pattern
|
### Tool Implementation Pattern
|
||||||
|
|
||||||
|
|
|
||||||
24
README.md
24
README.md
|
|
@ -107,16 +107,32 @@ hermes version # Show version info
|
||||||
|
|
||||||
### CLI Commands (inside chat)
|
### CLI Commands (inside chat)
|
||||||
|
|
||||||
|
Type `/` to see an autocomplete dropdown of all commands.
|
||||||
|
|
||||||
| Command | Description |
|
| Command | Description |
|
||||||
|---------|-------------|
|
|---------|-------------|
|
||||||
| `/help` | Show available commands |
|
| `/help` | Show available commands |
|
||||||
| `/tools` | List available tools |
|
| `/tools` | List available tools |
|
||||||
|
| `/toolsets` | List available toolsets |
|
||||||
| `/model [name]` | Show or change model |
|
| `/model [name]` | Show or change model |
|
||||||
|
| `/prompt` | View/set custom system prompt |
|
||||||
| `/personality [name]` | Set personality (kawaii, pirate, etc.) |
|
| `/personality [name]` | Set personality (kawaii, pirate, etc.) |
|
||||||
| `/clear` | Clear screen and reset |
|
| `/clear` | Clear screen and reset conversation |
|
||||||
| `/cron` | Manage scheduled tasks |
|
| `/history` | Show conversation history |
|
||||||
|
| `/reset` | Reset conversation only (keep screen) |
|
||||||
|
| `/retry` | Retry the last message |
|
||||||
|
| `/undo` | Remove the last exchange |
|
||||||
|
| `/save` | Save the current conversation |
|
||||||
| `/config` | Show current configuration |
|
| `/config` | Show current configuration |
|
||||||
| `/quit` | Exit |
|
| `/cron` | Manage scheduled tasks |
|
||||||
|
| `/platforms` | Show gateway/messaging platform status |
|
||||||
|
| `/quit` | Exit (also: `/exit`, `/q`) |
|
||||||
|
|
||||||
|
**Keybindings:**
|
||||||
|
- `Enter` — send message
|
||||||
|
- `Alt+Enter` or `Ctrl+J` — new line (multi-line input)
|
||||||
|
- `Ctrl+C` — interrupt agent (double-press to force exit)
|
||||||
|
- `Ctrl+D` — exit
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -134,7 +150,7 @@ hermes --toolsets "web,terminal"
|
||||||
hermes --list-tools
|
hermes --list-tools
|
||||||
```
|
```
|
||||||
|
|
||||||
**Available toolsets:** `web`, `terminal`, `browser`, `vision`, `creative`, `reasoning`, `skills`, `tts`, `cronjob`, and more.
|
**Available toolsets:** `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `cronjob`, and more.
|
||||||
|
|
||||||
### 🔊 Text-to-Speech
|
### 🔊 Text-to-Speech
|
||||||
|
|
||||||
|
|
|
||||||
8
TODO.md
8
TODO.md
|
|
@ -4,7 +4,7 @@
|
||||||
|
|
||||||
## What We Already Have (for reference)
|
## What We Already Have (for reference)
|
||||||
|
|
||||||
**42+ tools** across 12 toolsets: web (search, extract), terminal + process management, file ops (read, write, patch, search), vision, MoA reasoning, image gen, browser (10 tools via Browserbase), skills (41 skills), cronjobs, RL training (10 tools via Tinker-Atropos), TTS, cross-channel messaging.
|
**43+ tools** across 13 toolsets: web (search, extract), terminal + process management, file ops (read, write, patch, search), vision, MoA reasoning, image gen, browser (10 tools via Browserbase), skills (41 skills), **todo (task planning)**, cronjobs, RL training (10 tools via Tinker-Atropos), TTS, cross-channel messaging.
|
||||||
|
|
||||||
**4 platform adapters**: Telegram, Discord, WhatsApp, Slack -- all with typing indicators, image/voice auto-analysis, dangerous command approval, interrupt support, background process watchers.
|
**4 platform adapters**: Telegram, Discord, WhatsApp, Slack -- all with typing indicators, image/voice auto-analysis, dangerous command approval, interrupt support, background process watchers.
|
||||||
|
|
||||||
|
|
@ -41,9 +41,9 @@ The main agent becomes an orchestrator that delegates context-heavy tasks to sub
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 2. Planning & Task Management 📋
|
## 2. Planning & Task Management 📋 ✅
|
||||||
|
|
||||||
**Status:** Not started
|
**Status:** Implemented
|
||||||
**Priority:** High -- every serious agent has this now
|
**Priority:** High -- every serious agent has this now
|
||||||
|
|
||||||
A `todo` tool the agent uses to decompose complex tasks, track progress, and recover from failures. Must be **cache-friendly** -- no system prompt mutation, no injected messages that invalidate the KV cache prefix.
|
A `todo` tool the agent uses to decompose complex tasks, track progress, and recover from failures. Must be **cache-friendly** -- no system prompt mutation, no injected messages that invalidate the KV cache prefix.
|
||||||
|
|
@ -935,7 +935,7 @@ This goes in the tool description:
|
||||||
**Tier 1 (High impact, foundation for everything else):**
|
**Tier 1 (High impact, foundation for everything else):**
|
||||||
1. Programmatic Tool Calling (code-mediated tool use) -- #20
|
1. Programmatic Tool Calling (code-mediated tool use) -- #20
|
||||||
2. Memory System (Phase 1: MEMORY.md + USER.md) -- #5
|
2. Memory System (Phase 1: MEMORY.md + USER.md) -- #5
|
||||||
3. Planning & Task Management (todo tool) -- #2
|
3. ~~Planning & Task Management (todo tool) -- #2~~ **DONE**
|
||||||
4. Session Transcript Search -- #6
|
4. Session Transcript Search -- #6
|
||||||
5. Self-Learning from Errors -- #16
|
5. Self-Learning from Errors -- #16
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -185,15 +185,20 @@ agent:
|
||||||
#
|
#
|
||||||
# web - Web search and content extraction (web_search, web_extract)
|
# web - Web search and content extraction (web_search, web_extract)
|
||||||
# search - Web search only, no scraping (web_search)
|
# search - Web search only, no scraping (web_search)
|
||||||
# terminal - Command execution (terminal)
|
# terminal - Command execution and process management (terminal, process)
|
||||||
|
# file - File operations: read, write, patch, search
|
||||||
# browser - Full browser automation (navigate, click, type, screenshot, etc.)
|
# browser - Full browser automation (navigate, click, type, screenshot, etc.)
|
||||||
# vision - Image analysis (vision_analyze)
|
# vision - Image analysis (vision_analyze)
|
||||||
# image_gen - Image generation with FLUX (image_generate)
|
# image_gen - Image generation with FLUX (image_generate)
|
||||||
# skills - Load skill documents (skills_categories, skills_list, skill_view)
|
# skills - Load skill documents (skills_list, skill_view)
|
||||||
# moa - Mixture of Agents reasoning (mixture_of_agents)
|
# moa - Mixture of Agents reasoning (mixture_of_agents)
|
||||||
|
# todo - Task planning and tracking for multi-step work
|
||||||
|
# tts - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI)
|
||||||
|
# cronjob - Schedule and manage automated tasks (CLI-only)
|
||||||
|
# rl - RL training tools (Tinker-Atropos)
|
||||||
#
|
#
|
||||||
# Composite toolsets:
|
# Composite toolsets:
|
||||||
# debugging - terminal + web (for troubleshooting)
|
# debugging - terminal + web + file (for troubleshooting)
|
||||||
# safe - web + vision + moa (no terminal access)
|
# safe - web + vision + moa (no terminal access)
|
||||||
|
|
||||||
# -----------------------------------------------------------------------------
|
# -----------------------------------------------------------------------------
|
||||||
|
|
|
||||||
|
|
@ -47,6 +47,7 @@ async def web_search(query: str) -> dict:
|
||||||
| **TTS** | `tts_tool.py` | `text_to_speech` (Edge TTS free / ElevenLabs / OpenAI) |
|
| **TTS** | `tts_tool.py` | `text_to_speech` (Edge TTS free / ElevenLabs / OpenAI) |
|
||||||
| **Reasoning** | `mixture_of_agents_tool.py` | `mixture_of_agents` |
|
| **Reasoning** | `mixture_of_agents_tool.py` | `mixture_of_agents` |
|
||||||
| **Skills** | `skills_tool.py` | `skills_list`, `skill_view` |
|
| **Skills** | `skills_tool.py` | `skills_list`, `skill_view` |
|
||||||
|
| **Todo** | `todo_tool.py` | `todo` (read/write task list for multi-step planning) |
|
||||||
| **Cronjob** | `cronjob_tools.py` | `schedule_cronjob`, `list_cronjobs`, `remove_cronjob` |
|
| **Cronjob** | `cronjob_tools.py` | `schedule_cronjob`, `list_cronjobs`, `remove_cronjob` |
|
||||||
| **RL Training** | `rl_training_tool.py` | `rl_list_environments`, `rl_start_training`, `rl_check_status`, etc. |
|
| **RL Training** | `rl_training_tool.py` | `rl_list_environments`, `rl_start_training`, `rl_check_status`, etc. |
|
||||||
|
|
||||||
|
|
@ -83,7 +84,11 @@ TOOLSETS = {
|
||||||
},
|
},
|
||||||
"terminal": {
|
"terminal": {
|
||||||
"description": "Command execution",
|
"description": "Command execution",
|
||||||
"tools": ["terminal"]
|
"tools": ["terminal", "process"]
|
||||||
|
},
|
||||||
|
"todo": {
|
||||||
|
"description": "Task planning and tracking for multi-step work",
|
||||||
|
"tools": ["todo"]
|
||||||
},
|
},
|
||||||
# ...
|
# ...
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,9 @@ def _print_setup_summary(config: dict, hermes_home):
|
||||||
# Terminal (always available if system deps met)
|
# Terminal (always available if system deps met)
|
||||||
tool_status.append(("Terminal/Commands", True, None))
|
tool_status.append(("Terminal/Commands", True, None))
|
||||||
|
|
||||||
|
# Task planning (always available, in-memory)
|
||||||
|
tool_status.append(("Task Planning (todo)", True, None))
|
||||||
|
|
||||||
# Skills (always available if skills dir exists)
|
# Skills (always available if skills dir exists)
|
||||||
tool_status.append(("Skills Knowledge Base", True, None))
|
tool_status.append(("Skills Knowledge Base", True, None))
|
||||||
|
|
||||||
|
|
|
||||||
20
toolsets.py
20
toolsets.py
|
|
@ -189,6 +189,11 @@ TOOLSETS = {
|
||||||
"image_generate",
|
"image_generate",
|
||||||
# Text-to-speech
|
# Text-to-speech
|
||||||
"text_to_speech",
|
"text_to_speech",
|
||||||
|
# Browser automation (requires Browserbase API key)
|
||||||
|
"browser_navigate", "browser_snapshot", "browser_click",
|
||||||
|
"browser_type", "browser_scroll", "browser_back",
|
||||||
|
"browser_press", "browser_close", "browser_get_images",
|
||||||
|
"browser_vision",
|
||||||
# Skills - access knowledge base
|
# Skills - access knowledge base
|
||||||
"skills_list", "skill_view",
|
"skills_list", "skill_view",
|
||||||
# Planning & task management
|
# Planning & task management
|
||||||
|
|
@ -216,6 +221,11 @@ TOOLSETS = {
|
||||||
"image_generate",
|
"image_generate",
|
||||||
# Text-to-speech
|
# Text-to-speech
|
||||||
"text_to_speech",
|
"text_to_speech",
|
||||||
|
# Browser automation (requires Browserbase API key)
|
||||||
|
"browser_navigate", "browser_snapshot", "browser_click",
|
||||||
|
"browser_type", "browser_scroll", "browser_back",
|
||||||
|
"browser_press", "browser_close", "browser_get_images",
|
||||||
|
"browser_vision",
|
||||||
# Skills - access knowledge base
|
# Skills - access knowledge base
|
||||||
"skills_list", "skill_view",
|
"skills_list", "skill_view",
|
||||||
# Planning & task management
|
# Planning & task management
|
||||||
|
|
@ -243,6 +253,11 @@ TOOLSETS = {
|
||||||
"image_generate",
|
"image_generate",
|
||||||
# Text-to-speech
|
# Text-to-speech
|
||||||
"text_to_speech",
|
"text_to_speech",
|
||||||
|
# Browser automation (requires Browserbase API key)
|
||||||
|
"browser_navigate", "browser_snapshot", "browser_click",
|
||||||
|
"browser_type", "browser_scroll", "browser_back",
|
||||||
|
"browser_press", "browser_close", "browser_get_images",
|
||||||
|
"browser_vision",
|
||||||
# Skills
|
# Skills
|
||||||
"skills_list", "skill_view",
|
"skills_list", "skill_view",
|
||||||
# Planning & task management
|
# Planning & task management
|
||||||
|
|
@ -270,6 +285,11 @@ TOOLSETS = {
|
||||||
"image_generate",
|
"image_generate",
|
||||||
# Text-to-speech
|
# Text-to-speech
|
||||||
"text_to_speech",
|
"text_to_speech",
|
||||||
|
# Browser automation (requires Browserbase API key)
|
||||||
|
"browser_navigate", "browser_snapshot", "browser_click",
|
||||||
|
"browser_type", "browser_scroll", "browser_back",
|
||||||
|
"browser_press", "browser_close", "browser_get_images",
|
||||||
|
"browser_vision",
|
||||||
# Skills - access knowledge base
|
# Skills - access knowledge base
|
||||||
"skills_list", "skill_view",
|
"skills_list", "skill_view",
|
||||||
# Planning & task management
|
# Planning & task management
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue