--- name: browser-use version: "1.1.0" description: Run web automation tasks through browser-use and Chromium CDP (headless or GUI). triggers: - "browser-use" - "open website and extract" - "automate browser task" - "run browser task" - "открой сайт" - "заполни форму" - "найди на странице" - "сделай в браузере" allowed-tools: - terminal - file - memory --- # Browser Use (Chromium/CDP) Use this skill when a task requires real browser actions: open pages, click, type, submit forms, extract text/data, verify visible results. ## Decision: when to use this skill Use `browser-use` if user asks to: - navigate websites step-by-step; - interact with UI elements (buttons, inputs, dropdowns); - extract structured content from rendered pages; - complete multi-step flows (login/search/filter/checkout draft). Do **not** use `browser-use` if task is: - pure static fetch/API call (use lighter tools); - local file manipulation only; - impossible due to CAPTCHA/2FA/region lock without user intervention. ## What the agent can and cannot see Short answer to common question: **the agent sees the rendered page state, not all JavaScript source by default**. The agent typically sees/uses: - rendered DOM and interactive elements; - visible text/content after JS execution; - current URL, titles, form states; - action results/errors returned by browser-use. The agent does **not automatically** get: - full source code of all loaded JS bundles; - complete DevTools Network timeline; - hidden backend logic not exposed in page content. If user asks about JS specifically, do explicit steps: 1. locate script URLs from page source/DOM; 2. open script URL(s) directly; 3. extract needed fragments (function names, endpoints, constants). ## Runtime modes (CDP endpoints) This project supports two modes. 1) Headless browserless Chromium: - CDP: `ws://chromium:3000/chromium?token=hermes-local` 2) GUI Chromium (visible in noVNC): - CDP: `http://172.25.0.3:9223` - Visual stream: `http://localhost:6080/vnc.html` Notes: - `run_browser_use.py` accepts both `ws://` and `http://` CDP URLs. - For `http://`, script resolves `/json/version` and converts to websocket URL automatically. ## Required environment Minimum required env vars: - `OPENAI_API_KEY` - optional: `OPENAI_BASE_URL` - optional: `OPENAI_MODEL` or `BROWSER_USE_MODEL` - optional override: `BROWSER_USE_CDP_URL` Defaults in this repo: - `BROWSER_USE_PYTHON=/opt/browser-use-venv/bin/python` - `BROWSER_USE_CDP_URL=http://172.25.0.3:9223` (from `docker-compose.yml`) ## Quick runbook (inside Docker) 1. Ensure services are up: ```bash docker compose --profile gui up -d docker compose ps ``` 2. Check env in `hermes-agent`: ```bash docker compose exec -T hermes-agent python - <<'PY' import os print('OPENAI_API_KEY', '' if os.getenv('OPENAI_API_KEY') else '') print('BROWSER_USE_CDP_URL', os.getenv('BROWSER_USE_CDP_URL', '')) print('OPENAI_MODEL', os.getenv('OPENAI_MODEL', '')) PY ``` 3. Run a task: ```bash python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \ --task "Open example.com and return page title" \ --max-steps 8 ``` 4. For GUI visibility, open stream: ```bash open "http://localhost:6080/vnc.html" ``` ## Runbook (outside Docker) Use one combined command so env vars are available in the same process: ```bash export OPENAI_API_KEY="$OPENAI_API_KEY" && \ export BROWSER_USE_CDP_URL="$BROWSER_USE_CDP_URL" && \ /opt/browser-use-venv/bin/python /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \ --task "" \ --max-steps 20 ``` ## How Hermes should call this skill Standard pattern: ```bash python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \ --task "" \ --max-steps 20 ``` If user gave a starting page, add `--start-url`. ```bash python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \ --task "Find contact email" \ --start-url "https://example.com" \ --max-steps 20 ``` ## Troubleshooting (symptom -> action) `{"success": false, "error": "OPENAI_API_KEY is not set"}` - check `workspace/.env` and `hermes_data/.env`; - recreate container: ```bash docker compose up -d --force-recreate hermes-agent ``` `401 key_model_access_denied` - model is not allowed for API key; - set `BROWSER_USE_MODEL` or `OPENAI_MODEL` to an allowed model. `Connection refused` or CDP errors - verify browser container is running: ```bash docker compose ps docker compose exec -T hermes-agent bash -lc 'curl -s http://172.25.0.3:9223/json/version | head' ``` Timeout / exit code `124` - not necessarily script failure; - increase `--max-steps` and/or task timeout envelope. ## Site-specific limitations - Yandex Music: may be blocked by region. - Wildberries: anti-bot/CAPTCHA may block automation. When blocked by anti-bot/2FA/CAPTCHA: - ask user for manual intervention; - continue automation after challenge is passed; - or switch to non-browser strategy if acceptable. ## Operational notes - Script file: `/root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py` - Script output: JSON (`success`, `cdp_url`, `result.final_result`, `result.errors`) - In current implementation `use_vision=False`, so decisions are based on browser-use structured state rather than visual screenshot reasoning.