fix: strip ANSI at the source — clean terminal output before it reaches the model
Root cause: terminal_tool, execute_code, and process_registry returned raw subprocess output with ANSI escape sequences intact. The model saw these in tool results and copied them into file writes. Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py, but this was a band-aid — regex on file content risks corrupting legitimate content, and doesn't prevent ANSI from wasting tokens in the model context. Source-level fix: - New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI (incl. private-mode, colon-separated, intermediate bytes), OSC (both terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1 - terminal_tool.py: strip output before returning to model - code_execution_tool.py: strip stdout/stderr before returning - process_registry.py: strip output in poll/read_log/wait - file_tools.py: remove _strip_ansi band-aid (no longer needed) Verified: `ls --color=always` output returned as clean text to model, file written from that output contains zero ESC bytes.
This commit is contained in:
parent
6302e56e7c
commit
934fbe3c06
7 changed files with 236 additions and 21 deletions
44
tools/ansi_strip.py
Normal file
44
tools/ansi_strip.py
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
"""Strip ANSI escape sequences from subprocess output.
|
||||
|
||||
Used by terminal_tool, code_execution_tool, and process_registry to clean
|
||||
command output before returning it to the model. This prevents ANSI codes
|
||||
from entering the model's context — which is the root cause of models
|
||||
copying escape sequences into file writes.
|
||||
|
||||
Covers the full ECMA-48 spec: CSI (including private-mode ``?`` prefix,
|
||||
colon-separated params, intermediate bytes), OSC (BEL and ST terminators),
|
||||
DCS/SOS/PM/APC string sequences, nF multi-byte escapes, Fp/Fe/Fs
|
||||
single-byte escapes, and 8-bit C1 control characters.
|
||||
"""
|
||||
|
||||
import re
|
||||
|
||||
_ANSI_ESCAPE_RE = re.compile(
|
||||
r"\x1b"
|
||||
r"(?:"
|
||||
r"\[[\x30-\x3f]*[\x20-\x2f]*[\x40-\x7e]" # CSI sequence
|
||||
r"|\][\s\S]*?(?:\x07|\x1b\\)" # OSC (BEL or ST terminator)
|
||||
r"|[PX^_][\s\S]*?(?:\x1b\\)" # DCS/SOS/PM/APC strings
|
||||
r"|[\x20-\x2f]+[\x30-\x7e]" # nF escape sequences
|
||||
r"|[\x30-\x7e]" # Fp/Fe/Fs single-byte
|
||||
r")"
|
||||
r"|\x9b[\x30-\x3f]*[\x20-\x2f]*[\x40-\x7e]" # 8-bit CSI
|
||||
r"|\x9d[\s\S]*?(?:\x07|\x9c)" # 8-bit OSC
|
||||
r"|[\x80-\x9f]", # Other 8-bit C1 controls
|
||||
re.DOTALL,
|
||||
)
|
||||
|
||||
# Fast-path check — skip full regex when no escape-like bytes are present.
|
||||
_HAS_ESCAPE = re.compile(r"[\x1b\x80-\x9f]")
|
||||
|
||||
|
||||
def strip_ansi(text: str) -> str:
|
||||
"""Remove ANSI escape sequences from text.
|
||||
|
||||
Returns the input unchanged (fast path) when no ESC or C1 bytes are
|
||||
present. Safe to call on any string — clean text passes through
|
||||
with negligible overhead.
|
||||
"""
|
||||
if not text or not _HAS_ESCAPE.search(text):
|
||||
return text
|
||||
return _ANSI_ESCAPE_RE.sub("", text)
|
||||
Loading…
Add table
Add a link
Reference in a new issue