update skill

2026-03-26 23:28:59 +03:00 · 2026-03-26 23:28:59 +03:00 · 74cb5455ca
commit 74cb5455ca
parent aa7927a316
7 changed files with 524 additions and 0 deletions
--- a/browser-use/SKILL.md
+++ b/browser-use/SKILL.md
@ -0,0 +1,191 @@
+---
+name: browser-use
+version: "1.1.0"
+description: Run web automation tasks through browser-use and Chromium CDP (headless or GUI).
+triggers:
+  - "browser-use"
+  - "open website and extract"
+  - "automate browser task"
+  - "run browser task"
+  - "открой сайт"
+  - "заполни форму"
+  - "найди на странице"
+  - "сделай в браузере"
+allowed-tools:
+  - terminal
+  - file
+  - memory
+---
+
+# Browser Use (Chromium/CDP)
+
+Use this skill when a task requires real browser actions: open pages, click, type, submit forms, extract text/data, verify visible results.
+
+## Decision: when to use this skill
+
+Use `browser-use` if user asks to:
+- navigate websites step-by-step;
+- interact with UI elements (buttons, inputs, dropdowns);
+- extract structured content from rendered pages;
+- complete multi-step flows (login/search/filter/checkout draft).
+
+Do **not** use `browser-use` if task is:
+- pure static fetch/API call (use lighter tools);
+- local file manipulation only;
+- impossible due to CAPTCHA/2FA/region lock without user intervention.
+
+## What the agent can and cannot see
+
+Short answer to common question: **the agent sees the rendered page state, not all JavaScript source by default**.
+
+The agent typically sees/uses:
+- rendered DOM and interactive elements;
+- visible text/content after JS execution;
+- current URL, titles, form states;
+- action results/errors returned by browser-use.
+
+The agent does **not automatically** get:
+- full source code of all loaded JS bundles;
+- complete DevTools Network timeline;
+- hidden backend logic not exposed in page content.
+
+If user asks about JS specifically, do explicit steps:
+1. locate script URLs from page source/DOM;
+2. open script URL(s) directly;
+3. extract needed fragments (function names, endpoints, constants).
+
+## Runtime modes (CDP endpoints)
+
+This project supports two modes.
+
+1) Headless browserless Chromium:
+- CDP: `ws://chromium:3000/chromium?token=hermes-local`
+
+2) GUI Chromium (visible in noVNC):
+- CDP: `http://172.25.0.3:9223`
+- Visual stream: `http://localhost:6080/vnc.html`
+
+Notes:
+- `run_browser_use.py` accepts both `ws://` and `http://` CDP URLs.
+- For `http://`, script resolves `/json/version` and converts to websocket URL automatically.
+
+## Required environment
+
+Minimum required env vars:
+- `OPENAI_API_KEY`
+- optional: `OPENAI_BASE_URL`
+- optional: `OPENAI_MODEL` or `BROWSER_USE_MODEL`
+- optional override: `BROWSER_USE_CDP_URL`
+
+Defaults in this repo:
+- `BROWSER_USE_PYTHON=/opt/browser-use-venv/bin/python`
+- `BROWSER_USE_CDP_URL=http://172.25.0.3:9223` (from `docker-compose.yml`)
+
+## Quick runbook (inside Docker)
+
+1. Ensure services are up:
+
+```bash
+docker compose --profile gui up -d
+docker compose ps
+```
+
+2. Check env in `hermes-agent`:
+
+```bash
+docker compose exec -T hermes-agent python - <<'PY'
+import os
+print('OPENAI_API_KEY', '<set>' if os.getenv('OPENAI_API_KEY') else '<missing>')
+print('BROWSER_USE_CDP_URL', os.getenv('BROWSER_USE_CDP_URL', '<missing>'))
+print('OPENAI_MODEL', os.getenv('OPENAI_MODEL', '<missing>'))
+PY
+```
+
+3. Run a task:
+
+```bash
+python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
+  --task "Open example.com and return page title" \
+  --max-steps 8
+```
+
+4. For GUI visibility, open stream:
+
+```bash
+open "http://localhost:6080/vnc.html"
+```
+
+## Runbook (outside Docker)
+
+Use one combined command so env vars are available in the same process:
+
+```bash
+export OPENAI_API_KEY="$OPENAI_API_KEY" && \
+export BROWSER_USE_CDP_URL="$BROWSER_USE_CDP_URL" && \
+/opt/browser-use-venv/bin/python /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
+  --task "<task>" \
+  --max-steps 20
+```
+
+## How Hermes should call this skill
+
+Standard pattern:
+
+```bash
+python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
+  --task "<user task in plain language>" \
+  --max-steps 20
+```
+
+If user gave a starting page, add `--start-url`.
+
+```bash
+python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
+  --task "Find contact email" \
+  --start-url "https://example.com" \
+  --max-steps 20
+```
+
+## Troubleshooting (symptom -> action)
+
+`{"success": false, "error": "OPENAI_API_KEY is not set"}`
+- check `workspace/.env` and `hermes_data/.env`;
+- recreate container:
+
+```bash
+docker compose up -d --force-recreate hermes-agent
+```
+
+`401 key_model_access_denied`
+- model is not allowed for API key;
+- set `BROWSER_USE_MODEL` or `OPENAI_MODEL` to an allowed model.
+
+`Connection refused` or CDP errors
+- verify browser container is running:
+
+```bash
+docker compose ps
+docker compose exec -T hermes-agent bash -lc 'curl -s http://172.25.0.3:9223/json/version | head'
+```
+
+Timeout / exit code `124`
+- not necessarily script failure;
+- increase `--max-steps` and/or task timeout envelope.
+
+## Site-specific limitations
+
+- Yandex Music: may be blocked by region.
+- Wildberries: anti-bot/CAPTCHA may block automation.
+
+When blocked by anti-bot/2FA/CAPTCHA:
+- ask user for manual intervention;
+- continue automation after challenge is passed;
+- or switch to non-browser strategy if acceptable.
+
+## Operational notes
+
+- Script file: `/root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py`
+- Script output: JSON (`success`, `cdp_url`, `result.final_result`, `result.errors`)
+- In current implementation `use_vision=False`, so decisions are based on browser-use structured state rather than visual screenshot reasoning.
+
+