BrowserUse_and_ComputerUse_.../browser-use/SKILL.md
Кобылкевич Фёдор 74cb5455ca update skill
2026-03-26 23:28:59 +03:00

5.4 KiB

name version description triggers allowed-tools
browser-use 1.1.0 Run web automation tasks through browser-use and Chromium CDP (headless or GUI).
browser-use
open website and extract
automate browser task
run browser task
открой сайт
заполни форму
найди на странице
сделай в браузере
terminal
file
memory

Browser Use (Chromium/CDP)

Use this skill when a task requires real browser actions: open pages, click, type, submit forms, extract text/data, verify visible results.

Decision: when to use this skill

Use browser-use if user asks to:

  • navigate websites step-by-step;
  • interact with UI elements (buttons, inputs, dropdowns);
  • extract structured content from rendered pages;
  • complete multi-step flows (login/search/filter/checkout draft).

Do not use browser-use if task is:

  • pure static fetch/API call (use lighter tools);
  • local file manipulation only;
  • impossible due to CAPTCHA/2FA/region lock without user intervention.

What the agent can and cannot see

Short answer to common question: the agent sees the rendered page state, not all JavaScript source by default.

The agent typically sees/uses:

  • rendered DOM and interactive elements;
  • visible text/content after JS execution;
  • current URL, titles, form states;
  • action results/errors returned by browser-use.

The agent does not automatically get:

  • full source code of all loaded JS bundles;
  • complete DevTools Network timeline;
  • hidden backend logic not exposed in page content.

If user asks about JS specifically, do explicit steps:

  1. locate script URLs from page source/DOM;
  2. open script URL(s) directly;
  3. extract needed fragments (function names, endpoints, constants).

Runtime modes (CDP endpoints)

This project supports two modes.

  1. Headless browserless Chromium:
  • CDP: ws://chromium:3000/chromium?token=hermes-local
  1. GUI Chromium (visible in noVNC):
  • CDP: http://172.25.0.3:9223
  • Visual stream: http://localhost:6080/vnc.html

Notes:

  • run_browser_use.py accepts both ws:// and http:// CDP URLs.
  • For http://, script resolves /json/version and converts to websocket URL automatically.

Required environment

Minimum required env vars:

  • OPENAI_API_KEY
  • optional: OPENAI_BASE_URL
  • optional: OPENAI_MODEL or BROWSER_USE_MODEL
  • optional override: BROWSER_USE_CDP_URL

Defaults in this repo:

  • BROWSER_USE_PYTHON=/opt/browser-use-venv/bin/python
  • BROWSER_USE_CDP_URL=http://172.25.0.3:9223 (from docker-compose.yml)

Quick runbook (inside Docker)

  1. Ensure services are up:
docker compose --profile gui up -d
docker compose ps
  1. Check env in hermes-agent:
docker compose exec -T hermes-agent python - <<'PY'
import os
print('OPENAI_API_KEY', '<set>' if os.getenv('OPENAI_API_KEY') else '<missing>')
print('BROWSER_USE_CDP_URL', os.getenv('BROWSER_USE_CDP_URL', '<missing>'))
print('OPENAI_MODEL', os.getenv('OPENAI_MODEL', '<missing>'))
PY
  1. Run a task:
python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
  --task "Open example.com and return page title" \
  --max-steps 8
  1. For GUI visibility, open stream:
open "http://localhost:6080/vnc.html"

Runbook (outside Docker)

Use one combined command so env vars are available in the same process:

export OPENAI_API_KEY="$OPENAI_API_KEY" && \
export BROWSER_USE_CDP_URL="$BROWSER_USE_CDP_URL" && \
/opt/browser-use-venv/bin/python /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
  --task "<task>" \
  --max-steps 20

How Hermes should call this skill

Standard pattern:

python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
  --task "<user task in plain language>" \
  --max-steps 20

If user gave a starting page, add --start-url.

python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
  --task "Find contact email" \
  --start-url "https://example.com" \
  --max-steps 20

Troubleshooting (symptom -> action)

{"success": false, "error": "OPENAI_API_KEY is not set"}

  • check workspace/.env and hermes_data/.env;
  • recreate container:
docker compose up -d --force-recreate hermes-agent

401 key_model_access_denied

  • model is not allowed for API key;
  • set BROWSER_USE_MODEL or OPENAI_MODEL to an allowed model.

Connection refused or CDP errors

  • verify browser container is running:
docker compose ps
docker compose exec -T hermes-agent bash -lc 'curl -s http://172.25.0.3:9223/json/version | head'

Timeout / exit code 124

  • not necessarily script failure;
  • increase --max-steps and/or task timeout envelope.

Site-specific limitations

  • Yandex Music: may be blocked by region.
  • Wildberries: anti-bot/CAPTCHA may block automation.

When blocked by anti-bot/2FA/CAPTCHA:

  • ask user for manual intervention;
  • continue automation after challenge is passed;
  • or switch to non-browser strategy if acceptable.

Operational notes

  • Script file: /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py
  • Script output: JSON (success, cdp_url, result.final_result, result.errors)
  • In current implementation use_vision=False, so decisions are based on browser-use structured state rather than visual screenshot reasoning.