Commit graph

3 commits

Author SHA1 Message Date
teknium1
d6c710706f docs: add real-world testing findings to OBLITERATUS skill
Added pitfalls discovered during live abliteration testing:
- Models < 1B have fragmented refusal, respond poorly (0.5B: 60%→20%)
- Models 3B+ work much better (3B: 75%→0% with advanced defaults)
- aggressive method can backfire on small models (made it worse)
- Spectral certification RED is common even when refusal rate is 0%
- Fixed torch property: total_mem → total_memory
2026-03-09 02:52:54 -07:00
teknium1
a6d3becd6a feat: update OBLITERATUS skill to v2.0 — match current repo state
Major updates to reflect the current OBLITERATUS codebase:

- Change default recommendation from 'informed' (experimental) to
  'advanced' (reliable, well-tested multi-direction SVD)
- Add new CLI commands: tourney, recommend, strategies, report,
  aggregate, abliterate (alias)
- Add --direction-method flag (diff_means, svd, leace)
- Add strategies module (embedding/FFN ablation, head pruning,
  layer removal)
- Add evaluation module with LM Eval Harness integration
- Expand analysis modules from 15 to 28
- Add Apple Silicon (MLX) support
- Add study presets (quick, jailbreak, knowledge, etc.)
- Add --contribute, --verify-sample-size, --preset flags
- Add complete CLI command reference table
- Fix torch property name: total_mem -> total_memory (caught
  during live testing)

Tested: Successfully abliterated Qwen2.5-0.5B-Instruct using
'advanced' method — refusal rate 0.4%, coherence 1.0, model
responds without refusal to test prompts.
2026-03-09 02:39:03 -07:00
teknium1
ab0f4126cf fix: restore all removed bundled skills + fix skills sync system
- Restored 21 skills removed in commits 757d012 and 740dd92:
  accelerate, audiocraft, code-review, faiss, flash-attention, gguf,
  grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft,
  pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion,
  tensorrt-llm, torchtitan, trl-fine-tuning, whisper

- Rewrote sync_skills() with proper update semantics:
  * New skills (not in manifest): copied to user dir
  * Existing skills (in manifest + on disk): updated via hash comparison
  * User-deleted skills (in manifest, not on disk): respected, not re-added
  * Stale manifest entries (removed from bundled): cleaned from manifest

- Added sync_skills() to CLI startup (cmd_chat) and gateway startup
  (start_gateway) — previously only ran during 'hermes update'

- Updated cmd_update output to show new/updated/cleaned counts

- Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh
  install, user deletion respect, update detection, stale cleanup, and
  name collision handling

75 bundled skills total. 2002 tests pass.
2026-03-06 15:57:30 -08:00