update skill
This commit is contained in:
parent
aa7927a316
commit
74cb5455ca
7 changed files with 524 additions and 0 deletions
191
browser-use/SKILL.md
Normal file
191
browser-use/SKILL.md
Normal file
|
|
@ -0,0 +1,191 @@
|
|||
---
|
||||
name: browser-use
|
||||
version: "1.1.0"
|
||||
description: Run web automation tasks through browser-use and Chromium CDP (headless or GUI).
|
||||
triggers:
|
||||
- "browser-use"
|
||||
- "open website and extract"
|
||||
- "automate browser task"
|
||||
- "run browser task"
|
||||
- "открой сайт"
|
||||
- "заполни форму"
|
||||
- "найди на странице"
|
||||
- "сделай в браузере"
|
||||
allowed-tools:
|
||||
- terminal
|
||||
- file
|
||||
- memory
|
||||
---
|
||||
|
||||
# Browser Use (Chromium/CDP)
|
||||
|
||||
Use this skill when a task requires real browser actions: open pages, click, type, submit forms, extract text/data, verify visible results.
|
||||
|
||||
## Decision: when to use this skill
|
||||
|
||||
Use `browser-use` if user asks to:
|
||||
- navigate websites step-by-step;
|
||||
- interact with UI elements (buttons, inputs, dropdowns);
|
||||
- extract structured content from rendered pages;
|
||||
- complete multi-step flows (login/search/filter/checkout draft).
|
||||
|
||||
Do **not** use `browser-use` if task is:
|
||||
- pure static fetch/API call (use lighter tools);
|
||||
- local file manipulation only;
|
||||
- impossible due to CAPTCHA/2FA/region lock without user intervention.
|
||||
|
||||
## What the agent can and cannot see
|
||||
|
||||
Short answer to common question: **the agent sees the rendered page state, not all JavaScript source by default**.
|
||||
|
||||
The agent typically sees/uses:
|
||||
- rendered DOM and interactive elements;
|
||||
- visible text/content after JS execution;
|
||||
- current URL, titles, form states;
|
||||
- action results/errors returned by browser-use.
|
||||
|
||||
The agent does **not automatically** get:
|
||||
- full source code of all loaded JS bundles;
|
||||
- complete DevTools Network timeline;
|
||||
- hidden backend logic not exposed in page content.
|
||||
|
||||
If user asks about JS specifically, do explicit steps:
|
||||
1. locate script URLs from page source/DOM;
|
||||
2. open script URL(s) directly;
|
||||
3. extract needed fragments (function names, endpoints, constants).
|
||||
|
||||
## Runtime modes (CDP endpoints)
|
||||
|
||||
This project supports two modes.
|
||||
|
||||
1) Headless browserless Chromium:
|
||||
- CDP: `ws://chromium:3000/chromium?token=hermes-local`
|
||||
|
||||
2) GUI Chromium (visible in noVNC):
|
||||
- CDP: `http://172.25.0.3:9223`
|
||||
- Visual stream: `http://localhost:6080/vnc.html`
|
||||
|
||||
Notes:
|
||||
- `run_browser_use.py` accepts both `ws://` and `http://` CDP URLs.
|
||||
- For `http://`, script resolves `/json/version` and converts to websocket URL automatically.
|
||||
|
||||
## Required environment
|
||||
|
||||
Minimum required env vars:
|
||||
- `OPENAI_API_KEY`
|
||||
- optional: `OPENAI_BASE_URL`
|
||||
- optional: `OPENAI_MODEL` or `BROWSER_USE_MODEL`
|
||||
- optional override: `BROWSER_USE_CDP_URL`
|
||||
|
||||
Defaults in this repo:
|
||||
- `BROWSER_USE_PYTHON=/opt/browser-use-venv/bin/python`
|
||||
- `BROWSER_USE_CDP_URL=http://172.25.0.3:9223` (from `docker-compose.yml`)
|
||||
|
||||
## Quick runbook (inside Docker)
|
||||
|
||||
1. Ensure services are up:
|
||||
|
||||
```bash
|
||||
docker compose --profile gui up -d
|
||||
docker compose ps
|
||||
```
|
||||
|
||||
2. Check env in `hermes-agent`:
|
||||
|
||||
```bash
|
||||
docker compose exec -T hermes-agent python - <<'PY'
|
||||
import os
|
||||
print('OPENAI_API_KEY', '<set>' if os.getenv('OPENAI_API_KEY') else '<missing>')
|
||||
print('BROWSER_USE_CDP_URL', os.getenv('BROWSER_USE_CDP_URL', '<missing>'))
|
||||
print('OPENAI_MODEL', os.getenv('OPENAI_MODEL', '<missing>'))
|
||||
PY
|
||||
```
|
||||
|
||||
3. Run a task:
|
||||
|
||||
```bash
|
||||
python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
|
||||
--task "Open example.com and return page title" \
|
||||
--max-steps 8
|
||||
```
|
||||
|
||||
4. For GUI visibility, open stream:
|
||||
|
||||
```bash
|
||||
open "http://localhost:6080/vnc.html"
|
||||
```
|
||||
|
||||
## Runbook (outside Docker)
|
||||
|
||||
Use one combined command so env vars are available in the same process:
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY="$OPENAI_API_KEY" && \
|
||||
export BROWSER_USE_CDP_URL="$BROWSER_USE_CDP_URL" && \
|
||||
/opt/browser-use-venv/bin/python /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
|
||||
--task "<task>" \
|
||||
--max-steps 20
|
||||
```
|
||||
|
||||
## How Hermes should call this skill
|
||||
|
||||
Standard pattern:
|
||||
|
||||
```bash
|
||||
python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
|
||||
--task "<user task in plain language>" \
|
||||
--max-steps 20
|
||||
```
|
||||
|
||||
If user gave a starting page, add `--start-url`.
|
||||
|
||||
```bash
|
||||
python-browser-use /root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py \
|
||||
--task "Find contact email" \
|
||||
--start-url "https://example.com" \
|
||||
--max-steps 20
|
||||
```
|
||||
|
||||
## Troubleshooting (symptom -> action)
|
||||
|
||||
`{"success": false, "error": "OPENAI_API_KEY is not set"}`
|
||||
- check `workspace/.env` and `hermes_data/.env`;
|
||||
- recreate container:
|
||||
|
||||
```bash
|
||||
docker compose up -d --force-recreate hermes-agent
|
||||
```
|
||||
|
||||
`401 key_model_access_denied`
|
||||
- model is not allowed for API key;
|
||||
- set `BROWSER_USE_MODEL` or `OPENAI_MODEL` to an allowed model.
|
||||
|
||||
`Connection refused` or CDP errors
|
||||
- verify browser container is running:
|
||||
|
||||
```bash
|
||||
docker compose ps
|
||||
docker compose exec -T hermes-agent bash -lc 'curl -s http://172.25.0.3:9223/json/version | head'
|
||||
```
|
||||
|
||||
Timeout / exit code `124`
|
||||
- not necessarily script failure;
|
||||
- increase `--max-steps` and/or task timeout envelope.
|
||||
|
||||
## Site-specific limitations
|
||||
|
||||
- Yandex Music: may be blocked by region.
|
||||
- Wildberries: anti-bot/CAPTCHA may block automation.
|
||||
|
||||
When blocked by anti-bot/2FA/CAPTCHA:
|
||||
- ask user for manual intervention;
|
||||
- continue automation after challenge is passed;
|
||||
- or switch to non-browser strategy if acceptable.
|
||||
|
||||
## Operational notes
|
||||
|
||||
- Script file: `/root/.hermes/skills/autonomous-ai-agents/browser-use/scripts/run_browser_use.py`
|
||||
- Script output: JSON (`success`, `cdp_url`, `result.final_result`, `result.errors`)
|
||||
- In current implementation `use_vision=False`, so decisions are based on browser-use structured state rather than visual screenshot reasoning.
|
||||
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue