refactor: reorganize skills into sub-categories
The skills directory was getting disorganized — mlops alone had 40 skills in a flat list, and 12 categories were singletons with just one skill each. Code change: - prompt_builder.py: Support sub-categories in skill scanner. skills/mlops/training/axolotl/SKILL.md now shows as category 'mlops/training' instead of just 'mlops'. Backwards-compatible with existing flat structure. Split mlops (40 skills) into 7 sub-categories: - mlops/training (12): accelerate, axolotl, flash-attention, grpo-rl-training, peft, pytorch-fsdp, pytorch-lightning, simpo, slime, torchtitan, trl-fine-tuning, unsloth - mlops/inference (8): gguf, guidance, instructor, llama-cpp, obliteratus, outlines, tensorrt-llm, vllm - mlops/models (6): audiocraft, clip, llava, segment-anything, stable-diffusion, whisper - mlops/vector-databases (4): chroma, faiss, pinecone, qdrant - mlops/evaluation (5): huggingface-tokenizers, lm-evaluation-harness, nemo-curator, saelens, weights-and-biases - mlops/cloud (2): lambda-labs, modal - mlops/research (1): dspy Merged singleton categories: - gifs → media (gif-search joins youtube-content) - music-creation → media (heartmula, songsee) - diagramming → creative (excalidraw joins ascii-art) - ocr-and-documents → productivity - domain → research (domain-intel) - feeds → research (blogwatcher) - market-data → research (polymarket) Fixed misplaced skills: - mlops/code-review → software-development (not ML-specific) - mlops/ml-paper-writing → research (academic writing) Added DESCRIPTION.md files for all new/updated categories.
This commit is contained in:
parent
d6c710706f
commit
732c66b0f3
217 changed files with 39 additions and 4 deletions
|
|
@ -1 +1,3 @@
|
|||
Media content extraction and transformation tools — YouTube transcripts, audio, video processing.
|
||||
---
|
||||
description: Skills for working with media content — YouTube transcripts, GIF search, music generation, and audio visualization.
|
||||
---
|
||||
|
|
|
|||
73
skills/media/gif-search/SKILL.md
Normal file
73
skills/media/gif-search/SKILL.md
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
---
|
||||
name: gif-search
|
||||
description: Search and download GIFs from Tenor using curl. No dependencies beyond curl and jq. Useful for finding reaction GIFs, creating visual content, and sending GIFs in chat.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [GIF, Media, Search, Tenor, API]
|
||||
---
|
||||
|
||||
# GIF Search (Tenor API)
|
||||
|
||||
Search and download GIFs directly via the Tenor API using curl. No extra tools needed.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `curl` and `jq` (both standard on Linux)
|
||||
|
||||
## Search for GIFs
|
||||
|
||||
```bash
|
||||
# Search and get GIF URLs
|
||||
curl -s "https://tenor.googleapis.com/v2/search?q=thumbs+up&limit=5&key=AIzaSyAyimkuYQYF_FXVALexPuGQctUWRURdCYQ" | jq -r '.results[].media_formats.gif.url'
|
||||
|
||||
# Get smaller/preview versions
|
||||
curl -s "https://tenor.googleapis.com/v2/search?q=nice+work&limit=3&key=AIzaSyAyimkuYQYF_FXVALexPuGQctUWRURdCYQ" | jq -r '.results[].media_formats.tinygif.url'
|
||||
```
|
||||
|
||||
## Download a GIF
|
||||
|
||||
```bash
|
||||
# Search and download the top result
|
||||
URL=$(curl -s "https://tenor.googleapis.com/v2/search?q=celebration&limit=1&key=AIzaSyAyimkuYQYF_FXVALexPuGQctUWRURdCYQ" | jq -r '.results[0].media_formats.gif.url')
|
||||
curl -sL "$URL" -o celebration.gif
|
||||
```
|
||||
|
||||
## Get Full Metadata
|
||||
|
||||
```bash
|
||||
curl -s "https://tenor.googleapis.com/v2/search?q=cat&limit=3&key=AIzaSyAyimkuYQYF_FXVALexPuGQctUWRURdCYQ" | jq '.results[] | {title: .title, url: .media_formats.gif.url, preview: .media_formats.tinygif.url, dimensions: .media_formats.gif.dims}'
|
||||
```
|
||||
|
||||
## API Parameters
|
||||
|
||||
| Parameter | Description |
|
||||
|-----------|-------------|
|
||||
| `q` | Search query (URL-encode spaces as `+`) |
|
||||
| `limit` | Max results (1-50, default 20) |
|
||||
| `key` | API key (the one above is Tenor's public demo key) |
|
||||
| `media_filter` | Filter formats: `gif`, `tinygif`, `mp4`, `tinymp4`, `webm` |
|
||||
| `contentfilter` | Safety: `off`, `low`, `medium`, `high` |
|
||||
| `locale` | Language: `en_US`, `es`, `fr`, etc. |
|
||||
|
||||
## Available Media Formats
|
||||
|
||||
Each result has multiple formats under `.media_formats`:
|
||||
|
||||
| Format | Use case |
|
||||
|--------|----------|
|
||||
| `gif` | Full quality GIF |
|
||||
| `tinygif` | Small preview GIF |
|
||||
| `mp4` | Video version (smaller file size) |
|
||||
| `tinymp4` | Small preview video |
|
||||
| `webm` | WebM video |
|
||||
| `nanogif` | Tiny thumbnail |
|
||||
|
||||
## Notes
|
||||
|
||||
- The API key above is Tenor's public demo key — it works but has rate limits
|
||||
- URL-encode the query: spaces as `+`, special chars as `%XX`
|
||||
- For sending in chat, `tinygif` URLs are lighter weight
|
||||
- GIF URLs can be used directly in markdown: ``
|
||||
170
skills/media/heartmula/SKILL.md
Normal file
170
skills/media/heartmula/SKILL.md
Normal file
|
|
@ -0,0 +1,170 @@
|
|||
---
|
||||
name: heartmula
|
||||
description: Set up and run HeartMuLa, the open-source music generation model family (Suno-like). Generates full songs from lyrics + tags with multilingual support.
|
||||
version: 1.0.0
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [music, audio, generation, ai, heartmula, heartcodec, lyrics, songs]
|
||||
related_skills: [audiocraft]
|
||||
---
|
||||
|
||||
# HeartMuLa - Open-Source Music Generation
|
||||
|
||||
## Overview
|
||||
HeartMuLa is a family of open-source music foundation models (Apache-2.0) that generates music conditioned on lyrics and tags. Comparable to Suno for open-source. Includes:
|
||||
- **HeartMuLa** - Music language model (3B/7B) for generation from lyrics + tags
|
||||
- **HeartCodec** - 12.5Hz music codec for high-fidelity audio reconstruction
|
||||
- **HeartTranscriptor** - Whisper-based lyrics transcription
|
||||
- **HeartCLAP** - Audio-text alignment model
|
||||
|
||||
## When to Use
|
||||
- User wants to generate music/songs from text descriptions
|
||||
- User wants an open-source Suno alternative
|
||||
- User wants local/offline music generation
|
||||
- User asks about HeartMuLa, heartlib, or AI music generation
|
||||
|
||||
## Hardware Requirements
|
||||
- **Minimum**: 8GB VRAM with `--lazy_load true` (loads/unloads models sequentially)
|
||||
- **Recommended**: 16GB+ VRAM for comfortable single-GPU usage
|
||||
- **Multi-GPU**: Use `--mula_device cuda:0 --codec_device cuda:1` to split across GPUs
|
||||
- 3B model with lazy_load peaks at ~6.2GB VRAM
|
||||
|
||||
## Installation Steps
|
||||
|
||||
### 1. Clone Repository
|
||||
```bash
|
||||
cd ~/ # or desired directory
|
||||
git clone https://github.com/HeartMuLa/heartlib.git
|
||||
cd heartlib
|
||||
```
|
||||
|
||||
### 2. Create Virtual Environment (Python 3.10 required)
|
||||
```bash
|
||||
uv venv --python 3.10 .venv
|
||||
. .venv/bin/activate
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
### 3. Fix Dependency Compatibility Issues
|
||||
|
||||
**IMPORTANT**: As of Feb 2026, the pinned dependencies have conflicts with newer packages. Apply these fixes:
|
||||
|
||||
```bash
|
||||
# Upgrade datasets (old version incompatible with current pyarrow)
|
||||
uv pip install --upgrade datasets
|
||||
|
||||
# Upgrade transformers (needed for huggingface-hub 1.x compatibility)
|
||||
uv pip install --upgrade transformers
|
||||
```
|
||||
|
||||
### 4. Patch Source Code (Required for transformers 5.x)
|
||||
|
||||
**Patch 1 - RoPE cache fix** in `src/heartlib/heartmula/modeling_heartmula.py`:
|
||||
|
||||
In the `setup_caches` method of the `HeartMuLa` class, add RoPE reinitialization after the `reset_caches` try/except block and before the `with device:` block:
|
||||
|
||||
```python
|
||||
# Re-initialize RoPE caches that were skipped during meta-device loading
|
||||
from torchtune.models.llama3_1._position_embeddings import Llama3ScaledRoPE
|
||||
for module in self.modules():
|
||||
if isinstance(module, Llama3ScaledRoPE) and not module.is_cache_built:
|
||||
module.rope_init()
|
||||
module.to(device)
|
||||
```
|
||||
|
||||
**Why**: `from_pretrained` creates model on meta device first; `Llama3ScaledRoPE.rope_init()` skips cache building on meta tensors, then never rebuilds after weights are loaded to real device.
|
||||
|
||||
**Patch 2 - HeartCodec loading fix** in `src/heartlib/pipelines/music_generation.py`:
|
||||
|
||||
Add `ignore_mismatched_sizes=True` to ALL `HeartCodec.from_pretrained()` calls (there are 2: the eager load in `__init__` and the lazy load in the `codec` property).
|
||||
|
||||
**Why**: VQ codebook `initted` buffers have shape `[1]` in checkpoint vs `[]` in model. Same data, just scalar vs 0-d tensor. Safe to ignore.
|
||||
|
||||
### 5. Download Model Checkpoints
|
||||
```bash
|
||||
cd heartlib # project root
|
||||
hf download --local-dir './ckpt' 'HeartMuLa/HeartMuLaGen'
|
||||
hf download --local-dir './ckpt/HeartMuLa-oss-3B' 'HeartMuLa/HeartMuLa-oss-3B-happy-new-year'
|
||||
hf download --local-dir './ckpt/HeartCodec-oss' 'HeartMuLa/HeartCodec-oss-20260123'
|
||||
```
|
||||
|
||||
All 3 can be downloaded in parallel. Total size is several GB.
|
||||
|
||||
## GPU / CUDA
|
||||
|
||||
HeartMuLa uses CUDA by default (`--mula_device cuda --codec_device cuda`). No extra setup needed if the user has an NVIDIA GPU with PyTorch CUDA support installed.
|
||||
|
||||
- The installed `torch==2.4.1` includes CUDA 12.1 support out of the box
|
||||
- `torchtune` may report version `0.4.0+cpu` — this is just package metadata, it still uses CUDA via PyTorch
|
||||
- To verify GPU is being used, look for "CUDA memory" lines in the output (e.g. "CUDA memory before unloading: 6.20 GB")
|
||||
- **No GPU?** You can run on CPU with `--mula_device cpu --codec_device cpu`, but expect generation to be **extremely slow** (potentially 30-60+ minutes for a single song vs ~4 minutes on GPU). CPU mode also requires significant RAM (~12GB+ free). If the user has no NVIDIA GPU, recommend using a cloud GPU service (Google Colab free tier with T4, Lambda Labs, etc.) or the online demo at https://heartmula.github.io/ instead.
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Generation
|
||||
```bash
|
||||
cd heartlib
|
||||
. .venv/bin/activate
|
||||
python ./examples/run_music_generation.py \
|
||||
--model_path=./ckpt \
|
||||
--version="3B" \
|
||||
--lyrics="./assets/lyrics.txt" \
|
||||
--tags="./assets/tags.txt" \
|
||||
--save_path="./assets/output.mp3" \
|
||||
--lazy_load true
|
||||
```
|
||||
|
||||
### Input Formatting
|
||||
|
||||
**Tags** (comma-separated, no spaces):
|
||||
```
|
||||
piano,happy,wedding,synthesizer,romantic
|
||||
```
|
||||
or
|
||||
```
|
||||
rock,energetic,guitar,drums,male-vocal
|
||||
```
|
||||
|
||||
**Lyrics** (use bracketed structural tags):
|
||||
```
|
||||
[Intro]
|
||||
|
||||
[Verse]
|
||||
Your lyrics here...
|
||||
|
||||
[Chorus]
|
||||
Chorus lyrics...
|
||||
|
||||
[Bridge]
|
||||
Bridge lyrics...
|
||||
|
||||
[Outro]
|
||||
```
|
||||
|
||||
### Key Parameters
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `--max_audio_length_ms` | 240000 | Max length in ms (240s = 4 min) |
|
||||
| `--topk` | 50 | Top-k sampling |
|
||||
| `--temperature` | 1.0 | Sampling temperature |
|
||||
| `--cfg_scale` | 1.5 | Classifier-free guidance scale |
|
||||
| `--lazy_load` | false | Load/unload models on demand (saves VRAM) |
|
||||
| `--mula_dtype` | bfloat16 | Dtype for HeartMuLa (bf16 recommended) |
|
||||
| `--codec_dtype` | float32 | Dtype for HeartCodec (fp32 recommended for quality) |
|
||||
|
||||
### Performance
|
||||
- RTF (Real-Time Factor) ≈ 1.0 — a 4-minute song takes ~4 minutes to generate
|
||||
- Output: MP3, 48kHz stereo, 128kbps
|
||||
|
||||
## Pitfalls
|
||||
1. **Do NOT use bf16 for HeartCodec** — degrades audio quality. Use fp32 (default).
|
||||
2. **Tags may be ignored** — known issue (#90). Lyrics tend to dominate; experiment with tag ordering.
|
||||
3. **Triton not available on macOS** — Linux/CUDA only for GPU acceleration.
|
||||
4. **RTX 5080 incompatibility** reported in upstream issues.
|
||||
5. The dependency pin conflicts require the manual upgrades and patches described above.
|
||||
|
||||
## Links
|
||||
- Repo: https://github.com/HeartMuLa/heartlib
|
||||
- Models: https://huggingface.co/HeartMuLa
|
||||
- Paper: https://arxiv.org/abs/2601.10547
|
||||
- License: Apache-2.0
|
||||
80
skills/media/songsee/SKILL.md
Normal file
80
skills/media/songsee/SKILL.md
Normal file
|
|
@ -0,0 +1,80 @@
|
|||
---
|
||||
name: songsee
|
||||
description: Generate spectrograms and audio feature visualizations (mel, chroma, MFCC, tempogram, etc.) from audio files via CLI. Useful for audio analysis, music production debugging, and visual documentation.
|
||||
version: 1.0.0
|
||||
author: community
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [Audio, Visualization, Spectrogram, Music, Analysis]
|
||||
homepage: https://github.com/steipete/songsee
|
||||
---
|
||||
|
||||
# songsee
|
||||
|
||||
Generate spectrograms and multi-panel audio feature visualizations from audio files.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Requires [Go](https://go.dev/doc/install):
|
||||
```bash
|
||||
go install github.com/steipete/songsee/cmd/songsee@latest
|
||||
```
|
||||
|
||||
Optional: `ffmpeg` for formats beyond WAV/MP3.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Basic spectrogram
|
||||
songsee track.mp3
|
||||
|
||||
# Save to specific file
|
||||
songsee track.mp3 -o spectrogram.png
|
||||
|
||||
# Multi-panel visualization grid
|
||||
songsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux
|
||||
|
||||
# Time slice (start at 12.5s, 8s duration)
|
||||
songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg
|
||||
|
||||
# From stdin
|
||||
cat track.mp3 | songsee - --format png -o out.png
|
||||
```
|
||||
|
||||
## Visualization Types
|
||||
|
||||
Use `--viz` with comma-separated values:
|
||||
|
||||
| Type | Description |
|
||||
|------|-------------|
|
||||
| `spectrogram` | Standard frequency spectrogram |
|
||||
| `mel` | Mel-scaled spectrogram |
|
||||
| `chroma` | Pitch class distribution |
|
||||
| `hpss` | Harmonic/percussive separation |
|
||||
| `selfsim` | Self-similarity matrix |
|
||||
| `loudness` | Loudness over time |
|
||||
| `tempogram` | Tempo estimation |
|
||||
| `mfcc` | Mel-frequency cepstral coefficients |
|
||||
| `flux` | Spectral flux (onset detection) |
|
||||
|
||||
Multiple `--viz` types render as a grid in a single image.
|
||||
|
||||
## Common Flags
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--viz` | Visualization types (comma-separated) |
|
||||
| `--style` | Color palette: `classic`, `magma`, `inferno`, `viridis`, `gray` |
|
||||
| `--width` / `--height` | Output image dimensions |
|
||||
| `--window` / `--hop` | FFT window and hop size |
|
||||
| `--min-freq` / `--max-freq` | Frequency range filter |
|
||||
| `--start` / `--duration` | Time slice of the audio |
|
||||
| `--format` | Output format: `jpg` or `png` |
|
||||
| `-o` | Output file path |
|
||||
|
||||
## Notes
|
||||
|
||||
- WAV and MP3 are decoded natively; other formats require `ffmpeg`
|
||||
- Output images can be inspected with `vision_analyze` for automated audio analysis
|
||||
- Useful for comparing audio outputs, debugging synthesis, or documenting audio processing pipelines
|
||||
Loading…
Add table
Add a link
Reference in a new issue