5.4 KiB
| name | version | description | triggers | allowed-tools | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| browser-use | 1.1.0 | Run web automation tasks through browser-use and Chromium CDP (headless or GUI). |
|
|
Browser Use (Chromium/CDP)
Use this skill when a task requires real browser actions: open pages, click, type, submit forms, extract text/data, verify visible results.
Decision: when to use this skill
Use browser-use if user asks to:
- navigate websites step-by-step;
- interact with UI elements (buttons, inputs, dropdowns);
- extract structured content from rendered pages;
- complete multi-step flows (login/search/filter/checkout draft).
Do not use browser-use if task is:
- pure static fetch/API call (use lighter tools);
- local file manipulation only;
- impossible due to CAPTCHA/2FA/region lock without user intervention.
What the agent can and cannot see
Short answer to common question: the agent sees the rendered page state, not all JavaScript source by default.
The agent typically sees/uses:
- rendered DOM and interactive elements;
- visible text/content after JS execution;
- current URL, titles, form states;
- action results/errors returned by browser-use.
The agent does not automatically get:
- full source code of all loaded JS bundles;
- complete DevTools Network timeline;
- hidden backend logic not exposed in page content.
If user asks about JS specifically, do explicit steps:
- locate script URLs from page source/DOM;
- open script URL(s) directly;
- extract needed fragments (function names, endpoints, constants).
Runtime modes (CDP endpoints)
This project supports two modes.
- Headless browserless Chromium:
- CDP:
ws://chromium:3000/chromium?token=hermes-local
- GUI Chromium (visible in noVNC):
- CDP:
http://172.25.0.3:9223 - Visual stream:
http://localhost:6080/vnc.html
Notes:
run_browser_use.pyaccepts bothws://andhttp://CDP URLs.- For
http://, script resolves/json/versionand converts to websocket URL automatically.
Required environment
Minimum required env vars:
OPENAI_API_KEY- optional:
OPENAI_BASE_URL - optional:
OPENAI_MODELorBROWSER_USE_MODEL - optional override:
BROWSER_USE_CDP_URL
Defaults in this repo:
BROWSER_USE_PYTHON=/opt/browser-use-venv/bin/pythonBROWSER_USE_CDP_URL=http://172.25.0.3:9223(fromdocker-compose.yml)
Quick runbook (inside Docker)
- Ensure services are up:
docker compose --profile gui up -d
docker compose ps
- Check env in
hermes-agent:
docker compose exec -T hermes-agent python - <<'PY'
import os
print('OPENAI_API_KEY', '<set>' if os.getenv('OPENAI_API_KEY') else '<missing>')
print('BROWSER_USE_CDP_URL', os.getenv('BROWSER_USE_CDP_URL', '<missing>'))
print('OPENAI_MODEL', os.getenv('OPENAI_MODEL', '<missing>'))
PY
- Run a task:
python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
--task "Open example.com and return page title" \
--max-steps 8
- For GUI visibility, open stream:
open "http://localhost:6080/vnc.html"
Runbook (outside Docker)
Use one combined command so env vars are available in the same process:
export OPENAI_API_KEY="$OPENAI_API_KEY" && \
export BROWSER_USE_CDP_URL="$BROWSER_USE_CDP_URL" && \
/opt/browser-use-venv/bin/python /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
--task "<task>" \
--max-steps 20
How Hermes should call this skill
Standard pattern:
python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
--task "<user task in plain language>" \
--max-steps 20
If user gave a starting page, add --start-url.
python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
--task "Find contact email" \
--start-url "https://example.com" \
--max-steps 20
Troubleshooting (symptom -> action)
{"success": false, "error": "OPENAI_API_KEY is not set"}
- check
workspace/.envandhermes_data/.env; - recreate container:
docker compose up -d --force-recreate hermes-agent
401 key_model_access_denied
- model is not allowed for API key;
- set
BROWSER_USE_MODELorOPENAI_MODELto an allowed model.
Connection refused or CDP errors
- verify browser container is running:
docker compose ps
docker compose exec -T hermes-agent bash -lc 'curl -s http://172.25.0.3:9223/json/version | head'
Timeout / exit code 124
- not necessarily script failure;
- increase
--max-stepsand/or task timeout envelope.
Site-specific limitations
- Yandex Music: may be blocked by region.
- Wildberries: anti-bot/CAPTCHA may block automation.
When blocked by anti-bot/2FA/CAPTCHA:
- ask user for manual intervention;
- continue automation after challenge is passed;
- or switch to non-browser strategy if acceptable.
Operational notes
- Script file:
/root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py - Script output: JSON (
success,cdp_url,result.final_result,result.errors) - In current implementation
use_vision=False, so decisions are based on browser-use structured state rather than visual screenshot reasoning.