refactor: reorganize skills into sub-categories

The skills directory was getting disorganized — mlops alone had 40 skills in a flat list, and 12 categories were singletons with just one skill each. Code change: - prompt_builder.py: Support sub-categories in skill scanner. skills/mlops/training/axolotl/SKILL.md now shows as category 'mlops/training' instead of just 'mlops'. Backwards-compatible with existing flat structure. Split mlops (40 skills) into 7 sub-categories: - mlops/training (12): accelerate, axolotl, flash-attention, grpo-rl-training, peft, pytorch-fsdp, pytorch-lightning, simpo, slime, torchtitan, trl-fine-tuning, unsloth - mlops/inference (8): gguf, guidance, instructor, llama-cpp, obliteratus, outlines, tensorrt-llm, vllm - mlops/models (6): audiocraft, clip, llava, segment-anything, stable-diffusion, whisper - mlops/vector-databases (4): chroma, faiss, pinecone, qdrant - mlops/evaluation (5): huggingface-tokenizers, lm-evaluation-harness, nemo-curator, saelens, weights-and-biases - mlops/cloud (2): lambda-labs, modal - mlops/research (1): dspy Merged singleton categories: - gifs → media (gif-search joins youtube-content) - music-creation → media (heartmula, songsee) - diagramming → creative (excalidraw joins ascii-art) - ocr-and-documents → productivity - domain → research (domain-intel) - feeds → research (blogwatcher) - market-data → research (polymarket) Fixed misplaced skills: - mlops/code-review → software-development (not ML-specific) - mlops/ml-paper-writing → research (academic writing) Added DESCRIPTION.md files for all new/updated categories.
2026-03-09 03:35:53 -07:00 · 2026-03-09 03:35:53 -07:00 · 732c66b0f3
commit 732c66b0f3
parent d6c710706f
217 changed files with 39 additions and 4 deletions
--- a/skills/media/DESCRIPTION.md
+++ b/skills/media/DESCRIPTION.md
@ -1 +1,3 @@
-Media content extraction and transformation tools — YouTube transcripts, audio, video processing.
+---
+description: Skills for working with media content — YouTube transcripts, GIF search, music generation, and audio visualization.
+---
--- a/skills/media/gif-search/SKILL.md
+++ b/skills/media/gif-search/SKILL.md
@ -0,0 +1,73 @@
+---
+name: gif-search
+description: Search and download GIFs from Tenor using curl. No dependencies beyond curl and jq. Useful for finding reaction GIFs, creating visual content, and sending GIFs in chat.
+version: 1.0.0
+author: Hermes Agent
+license: MIT
+metadata:
+  hermes:
+    tags: [GIF, Media, Search, Tenor, API]
+---
+
+# GIF Search (Tenor API)
+
+Search and download GIFs directly via the Tenor API using curl. No extra tools needed.
+
+## Prerequisites
+
+- `curl` and `jq` (both standard on Linux)
+
+## Search for GIFs
+
+```bash
+# Search and get GIF URLs
+curl -s "https://tenor.googleapis.com/v2/search?q=thumbs+up&limit=5&key=AIzaSyAyimkuYQYF_FXVALexPuGQctUWRURdCYQ" | jq -r '.results[].media_formats.gif.url'
+
+# Get smaller/preview versions
+curl -s "https://tenor.googleapis.com/v2/search?q=nice+work&limit=3&key=AIzaSyAyimkuYQYF_FXVALexPuGQctUWRURdCYQ" | jq -r '.results[].media_formats.tinygif.url'
+```
+
+## Download a GIF
+
+```bash
+# Search and download the top result
+URL=$(curl -s "https://tenor.googleapis.com/v2/search?q=celebration&limit=1&key=AIzaSyAyimkuYQYF_FXVALexPuGQctUWRURdCYQ" | jq -r '.results[0].media_formats.gif.url')
+curl -sL "$URL" -o celebration.gif
+```
+
+## Get Full Metadata
+
+```bash
+curl -s "https://tenor.googleapis.com/v2/search?q=cat&limit=3&key=AIzaSyAyimkuYQYF_FXVALexPuGQctUWRURdCYQ" | jq '.results[] | {title: .title, url: .media_formats.gif.url, preview: .media_formats.tinygif.url, dimensions: .media_formats.gif.dims}'
+```
+
+## API Parameters
+
+| Parameter | Description |
+|-----------|-------------|
+| `q` | Search query (URL-encode spaces as `+`) |
+| `limit` | Max results (1-50, default 20) |
+| `key` | API key (the one above is Tenor's public demo key) |
+| `media_filter` | Filter formats: `gif`, `tinygif`, `mp4`, `tinymp4`, `webm` |
+| `contentfilter` | Safety: `off`, `low`, `medium`, `high` |
+| `locale` | Language: `en_US`, `es`, `fr`, etc. |
+
+## Available Media Formats
+
+Each result has multiple formats under `.media_formats`:
+
+| Format | Use case |
+|--------|----------|
+| `gif` | Full quality GIF |
+| `tinygif` | Small preview GIF |
+| `mp4` | Video version (smaller file size) |
+| `tinymp4` | Small preview video |
+| `webm` | WebM video |
+| `nanogif` | Tiny thumbnail |
+
+## Notes
+
+- The API key above is Tenor's public demo key — it works but has rate limits
+- URL-encode the query: spaces as `+`, special chars as `%XX`
+- For sending in chat, `tinygif` URLs are lighter weight
+- GIF URLs can be used directly in markdown: `![alt](url)`
--- a/skills/media/heartmula/SKILL.md
+++ b/skills/media/heartmula/SKILL.md
@ -0,0 +1,170 @@
+---
+name: heartmula
+description: Set up and run HeartMuLa, the open-source music generation model family (Suno-like). Generates full songs from lyrics + tags with multilingual support.
+version: 1.0.0
+metadata:
+  hermes:
+    tags: [music, audio, generation, ai, heartmula, heartcodec, lyrics, songs]
+    related_skills: [audiocraft]
+---
+
+# HeartMuLa - Open-Source Music Generation
+
+## Overview
+HeartMuLa is a family of open-source music foundation models (Apache-2.0) that generates music conditioned on lyrics and tags. Comparable to Suno for open-source. Includes:
+- **HeartMuLa** - Music language model (3B/7B) for generation from lyrics + tags
+- **HeartCodec** - 12.5Hz music codec for high-fidelity audio reconstruction
+- **HeartTranscriptor** - Whisper-based lyrics transcription
+- **HeartCLAP** - Audio-text alignment model
+
+## When to Use
+- User wants to generate music/songs from text descriptions
+- User wants an open-source Suno alternative
+- User wants local/offline music generation
+- User asks about HeartMuLa, heartlib, or AI music generation
+
+## Hardware Requirements
+- **Minimum**: 8GB VRAM with `--lazy_load true` (loads/unloads models sequentially)
+- **Recommended**: 16GB+ VRAM for comfortable single-GPU usage
+- **Multi-GPU**: Use `--mula_device cuda:0 --codec_device cuda:1` to split across GPUs
+- 3B model with lazy_load peaks at ~6.2GB VRAM
+
+## Installation Steps
+
+### 1. Clone Repository
+```bash
+cd ~/  # or desired directory
+git clone https://github.com/HeartMuLa/heartlib.git
+cd heartlib
+```
+
+### 2. Create Virtual Environment (Python 3.10 required)
+```bash
+uv venv --python 3.10 .venv
+. .venv/bin/activate
+uv pip install -e .
+```
+
+### 3. Fix Dependency Compatibility Issues
+
+**IMPORTANT**: As of Feb 2026, the pinned dependencies have conflicts with newer packages. Apply these fixes:
+
+```bash
+# Upgrade datasets (old version incompatible with current pyarrow)
+uv pip install --upgrade datasets
+
+# Upgrade transformers (needed for huggingface-hub 1.x compatibility)
+uv pip install --upgrade transformers
+```
+
+### 4. Patch Source Code (Required for transformers 5.x)
+
+**Patch 1 - RoPE cache fix** in `src/heartlib/heartmula/modeling_heartmula.py`:
+
+In the `setup_caches` method of the `HeartMuLa` class, add RoPE reinitialization after the `reset_caches` try/except block and before the `with device:` block:
+
+```python
+# Re-initialize RoPE caches that were skipped during meta-device loading
+from torchtune.models.llama3_1._position_embeddings import Llama3ScaledRoPE
+for module in self.modules():
+    if isinstance(module, Llama3ScaledRoPE) and not module.is_cache_built:
+        module.rope_init()
+        module.to(device)
+```
+
+**Why**: `from_pretrained` creates model on meta device first; `Llama3ScaledRoPE.rope_init()` skips cache building on meta tensors, then never rebuilds after weights are loaded to real device.
+
+**Patch 2 - HeartCodec loading fix** in `src/heartlib/pipelines/music_generation.py`:
+
+Add `ignore_mismatched_sizes=True` to ALL `HeartCodec.from_pretrained()` calls (there are 2: the eager load in `__init__` and the lazy load in the `codec` property).
+
+**Why**: VQ codebook `initted` buffers have shape `[1]` in checkpoint vs `[]` in model. Same data, just scalar vs 0-d tensor. Safe to ignore.
+
+### 5. Download Model Checkpoints
+```bash
+cd heartlib  # project root
+hf download --local-dir './ckpt' 'HeartMuLa/HeartMuLaGen'
+hf download --local-dir './ckpt/HeartMuLa-oss-3B' 'HeartMuLa/HeartMuLa-oss-3B-happy-new-year'
+hf download --local-dir './ckpt/HeartCodec-oss' 'HeartMuLa/HeartCodec-oss-20260123'
+```
+
+All 3 can be downloaded in parallel. Total size is several GB.
+
+## GPU / CUDA
+
+HeartMuLa uses CUDA by default (`--mula_device cuda --codec_device cuda`). No extra setup needed if the user has an NVIDIA GPU with PyTorch CUDA support installed.
+
+- The installed `torch==2.4.1` includes CUDA 12.1 support out of the box
+- `torchtune` may report version `0.4.0+cpu` — this is just package metadata, it still uses CUDA via PyTorch
+- To verify GPU is being used, look for "CUDA memory" lines in the output (e.g. "CUDA memory before unloading: 6.20 GB")
+- **No GPU?** You can run on CPU with `--mula_device cpu --codec_device cpu`, but expect generation to be **extremely slow** (potentially 30-60+ minutes for a single song vs ~4 minutes on GPU). CPU mode also requires significant RAM (~12GB+ free). If the user has no NVIDIA GPU, recommend using a cloud GPU service (Google Colab free tier with T4, Lambda Labs, etc.) or the online demo at https://heartmula.github.io/ instead.
+
+## Usage
+
+### Basic Generation
+```bash
+cd heartlib
+. .venv/bin/activate
+python ./examples/run_music_generation.py \
+  --model_path=./ckpt \
+  --version="3B" \
+  --lyrics="./assets/lyrics.txt" \
+  --tags="./assets/tags.txt" \
+  --save_path="./assets/output.mp3" \
+  --lazy_load true
+```
+
+### Input Formatting
+
+**Tags** (comma-separated, no spaces):
+```
+piano,happy,wedding,synthesizer,romantic
+```
+or
+```
+rock,energetic,guitar,drums,male-vocal
+```
+
+**Lyrics** (use bracketed structural tags):
+```
+[Intro]
+
+[Verse]
+Your lyrics here...
+
+[Chorus]
+Chorus lyrics...
+
+[Bridge]
+Bridge lyrics...
+
+[Outro]
+```
+
+### Key Parameters
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `--max_audio_length_ms` | 240000 | Max length in ms (240s = 4 min) |
+| `--topk` | 50 | Top-k sampling |
+| `--temperature` | 1.0 | Sampling temperature |
+| `--cfg_scale` | 1.5 | Classifier-free guidance scale |
+| `--lazy_load` | false | Load/unload models on demand (saves VRAM) |
+| `--mula_dtype` | bfloat16 | Dtype for HeartMuLa (bf16 recommended) |
+| `--codec_dtype` | float32 | Dtype for HeartCodec (fp32 recommended for quality) |
+
+### Performance
+- RTF (Real-Time Factor) ≈ 1.0 — a 4-minute song takes ~4 minutes to generate
+- Output: MP3, 48kHz stereo, 128kbps
+
+## Pitfalls
+1. **Do NOT use bf16 for HeartCodec** — degrades audio quality. Use fp32 (default).
+2. **Tags may be ignored** — known issue (#90). Lyrics tend to dominate; experiment with tag ordering.
+3. **Triton not available on macOS** — Linux/CUDA only for GPU acceleration.
+4. **RTX 5080 incompatibility** reported in upstream issues.
+5. The dependency pin conflicts require the manual upgrades and patches described above.
+
+## Links
+- Repo: https://github.com/HeartMuLa/heartlib
+- Models: https://huggingface.co/HeartMuLa
+- Paper: https://arxiv.org/abs/2601.10547
+- License: Apache-2.0
--- a/skills/media/songsee/SKILL.md
+++ b/skills/media/songsee/SKILL.md
@ -0,0 +1,80 @@
+---
+name: songsee
+description: Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc.) from audio files via CLI. Useful for audio analysis, music production debugging, and visual documentation.
+version: 1.0.0
+author: community
+license: MIT
+metadata:
+  hermes:
+    tags: [Audio, Visualization, Spectrogram, Music, Analysis]
+    homepage: https://github.com/steipete/songsee
+---
+
+# songsee
+
+Generate spectrograms and multi-panel audio feature visualizations from audio files.
+
+## Prerequisites
+
+Requires [Go](https://go.dev/doc/install):
+```bash
+go install github.com/steipete/songsee/cmd/songsee@latest
+```
+
+Optional: `ffmpeg` for formats beyond WAV/MP3.
+
+## Quick Start
+
+```bash
+# Basic spectrogram
+songsee track.mp3
+
+# Save to specific file
+songsee track.mp3 -o spectrogram.png
+
+# Multi-panel visualization grid
+songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux
+
+# Time slice (start at 12.5s, 8s duration)
+songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg
+
+# From stdin
+cat track.mp3 | songsee - --format png -o out.png
+```
+
+## Visualization Types
+
+Use `--viz` with comma-separated values:
+
+| Type | Description |
+|------|-------------|
+| `spectrogram` | Standard frequency spectrogram |
+| `mel` | Mel-scaled spectrogram |
+| `chroma` | Pitch class distribution |
+| `hpss` | Harmonic/percussive separation |
+| `selfsim` | Self-similarity matrix |
+| `loudness` | Loudness over time |
+| `tempogram` | Tempo estimation |
+| `mfcc` | Mel-frequency cepstral coefficients |
+| `flux` | Spectral flux (onset detection) |
+
+Multiple `--viz` types render as a grid in a single image.
+
+## Common Flags
+
+| Flag | Description |
+|------|-------------|
+| `--viz` | Visualization types (comma-separated) |
+| `--style` | Color palette: `classic`, `magma`, `inferno`, `viridis`, `gray` |
+| `--width` / `--height` | Output image dimensions |
+| `--window` / `--hop` | FFT window and hop size |
+| `--min-freq` / `--max-freq` | Frequency range filter |
+| `--start` / `--duration` | Time slice of the audio |
+| `--format` | Output format: `jpg` or `png` |
+| `-o` | Output file path |
+
+## Notes
+
+- WAV and MP3 are decoded natively; other formats require `ffmpeg`
+- Output images can be inspected with `vision_analyze` for automated audio analysis
+- Useful for comparing audio outputs, debugging synthesis, or documenting audio processing pipelines