Merge PR #192: fix(security): catch multi-word prompt injection bypass in skills_guard
Authored by 0xbyt4. The 'ignore ... instructions' regex only matched a single word between 'ignore' and the keyword (previous/all/above/prior). Multi-word variants like 'ignore all prior instructions' bypassed the scanner entirely.
This commit is contained in:
commit
520a26c48f
1 changed files with 1 additions and 1 deletions
|
|
@ -157,7 +157,7 @@ THREAT_PATTERNS = [
|
|||
"markdown link with variable interpolation"),
|
||||
|
||||
# ── Prompt injection ──
|
||||
(r'ignore\s+(previous|all|above|prior)\s+instructions',
|
||||
(r'ignore\s+(?:\w+\s+)*(previous|all|above|prior)\s+instructions',
|
||||
"prompt_injection_ignore", "critical", "injection",
|
||||
"prompt injection: ignore previous instructions"),
|
||||
(r'you\s+are\s+now\s+',
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue