feat(mcp): add sampling support — server-initiated LLM requests (#753)

Add MCP sampling/createMessage capability via SamplingHandler class. Text-only sampling + tool use in sampling with governance (rate limits, model whitelist, token caps, tool loop limits). Per-server audit metrics. Based on concept from PR #366 by eren-karakus0. Restructured as class-based design with bug fixes and tests using real MCP SDK types. 50 new tests, 2600 total passing.
2026-03-09 03:37:38 -07:00 · 2026-03-09 03:37:38 -07:00 · 654e16187e
commit 654e16187e
parent 1f0944de21
5 changed files with 1307 additions and 4 deletions
--- a/skills/mcp/native-mcp/SKILL.md
+++ b/skills/mcp/native-mcp/SKILL.md
@ -321,6 +321,32 @@ mcp_servers:

 All tools from all servers are registered and available simultaneously. Each server's tools are prefixed with its name to avoid collisions.

+## Sampling (Server-Initiated LLM Requests)
+
+Hermes supports MCP's `sampling/createMessage` capability — MCP servers can request LLM completions through the agent during tool execution. This enables agent-in-the-loop workflows (data analysis, content generation, decision-making).
+
+Sampling is **enabled by default**. Configure per server:
+
+```yaml
+mcp_servers:
+  my_server:
+    command: "npx"
+    args: ["-y", "my-mcp-server"]
+    sampling:
+      enabled: true           # default: true
+      model: "gemini-3-flash" # model override (optional)
+      max_tokens_cap: 4096    # max tokens per request
+      timeout: 30             # LLM call timeout (seconds)
+      max_rpm: 10             # max requests per minute
+      allowed_models: []      # model whitelist (empty = all)
+      max_tool_rounds: 5      # tool loop limit (0 = disable)
+      log_level: "info"       # audit verbosity
+```
+
+Servers can also include `tools` in sampling requests for multi-turn tool-augmented workflows. The `max_tool_rounds` config prevents infinite tool loops. Per-server audit metrics (requests, errors, tokens, tool use count) are tracked via `get_mcp_status()`.
+
+Disable sampling for untrusted servers with `sampling: { enabled: false }`.
+
 ## Notes

 - MCP tools are called synchronously from the agent's perspective but run asynchronously on a dedicated background event loop