The architecture has been updated
This commit is contained in:
parent
805f7a017e
commit
a01257ead9
1119 changed files with 226 additions and 352 deletions
235
hermes_code/optional-skills/research/bioinformatics/SKILL.md
Normal file
235
hermes_code/optional-skills/research/bioinformatics/SKILL.md
Normal file
|
|
@ -0,0 +1,235 @@
|
|||
---
|
||||
name: bioinformatics
|
||||
description: Gateway to 400+ bioinformatics skills from bioSkills and ClawBio. Covers genomics, transcriptomics, single-cell, variant calling, pharmacogenomics, metagenomics, structural biology, and more. Fetches domain-specific reference material on demand.
|
||||
version: 1.0.0
|
||||
platforms: [linux, macos]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [bioinformatics, genomics, sequencing, biology, research, science]
|
||||
category: research
|
||||
---
|
||||
|
||||
# Bioinformatics Skills Gateway
|
||||
|
||||
Use when asked about bioinformatics, genomics, sequencing, variant calling, gene expression, single-cell analysis, protein structure, pharmacogenomics, metagenomics, phylogenetics, or any computational biology task.
|
||||
|
||||
This skill is a gateway to two open-source bioinformatics skill libraries. Instead of bundling hundreds of domain-specific skills, it indexes them and fetches what you need on demand.
|
||||
|
||||
## Sources
|
||||
|
||||
◆ **bioSkills** — 385 reference skills (code patterns, parameter guides, decision trees)
|
||||
Repo: https://github.com/GPTomics/bioSkills
|
||||
Format: SKILL.md per topic with code examples. Python/R/CLI.
|
||||
|
||||
◆ **ClawBio** — 33 runnable pipeline skills (executable scripts, reproducibility bundles)
|
||||
Repo: https://github.com/ClawBio/ClawBio
|
||||
Format: Python scripts with demos. Each analysis exports report.md + commands.sh + environment.yml.
|
||||
|
||||
## How to fetch and use a skill
|
||||
|
||||
1. Identify the domain and skill name from the index below.
|
||||
2. Clone the relevant repo (shallow clone to save time):
|
||||
```bash
|
||||
# bioSkills (reference material)
|
||||
git clone --depth 1 https://github.com/GPTomics/bioSkills.git /tmp/bioSkills
|
||||
|
||||
# ClawBio (runnable pipelines)
|
||||
git clone --depth 1 https://github.com/ClawBio/ClawBio.git /tmp/ClawBio
|
||||
```
|
||||
3. Read the specific skill:
|
||||
```bash
|
||||
# bioSkills — each skill is at: <category>/<skill-name>/SKILL.md
|
||||
cat /tmp/bioSkills/variant-calling/gatk-variant-calling/SKILL.md
|
||||
|
||||
# ClawBio — each skill is at: skills/<skill-name>/
|
||||
cat /tmp/ClawBio/skills/pharmgx-reporter/README.md
|
||||
```
|
||||
4. Follow the fetched skill as reference material. These are NOT Hermes-format skills — treat them as expert domain guides. They contain correct parameters, proper tool flags, and validated pipelines.
|
||||
|
||||
## Skill Index by Domain
|
||||
|
||||
### Sequence Fundamentals
|
||||
bioSkills:
|
||||
sequence-io/ — read-sequences, write-sequences, format-conversion, batch-processing, compressed-files, fastq-quality, filter-sequences, paired-end-fastq, sequence-statistics
|
||||
sequence-manipulation/ — seq-objects, reverse-complement, transcription-translation, motif-search, codon-usage, sequence-properties, sequence-slicing
|
||||
ClawBio:
|
||||
seq-wrangler — Sequence QC, alignment, and BAM processing (wraps FastQC, BWA, SAMtools)
|
||||
|
||||
### Read QC & Alignment
|
||||
bioSkills:
|
||||
read-qc/ — quality-reports, fastp-workflow, adapter-trimming, quality-filtering, umi-processing, contamination-screening, rnaseq-qc
|
||||
read-alignment/ — bwa-alignment, star-alignment, hisat2-alignment, bowtie2-alignment
|
||||
alignment-files/ — sam-bam-basics, alignment-sorting, alignment-filtering, bam-statistics, duplicate-handling, pileup-generation
|
||||
|
||||
### Variant Calling & Annotation
|
||||
bioSkills:
|
||||
variant-calling/ — gatk-variant-calling, deepvariant, variant-calling (bcftools), joint-calling, structural-variant-calling, filtering-best-practices, variant-annotation, variant-normalization, vcf-basics, vcf-manipulation, vcf-statistics, consensus-sequences, clinical-interpretation
|
||||
ClawBio:
|
||||
vcf-annotator — VEP + ClinVar + gnomAD annotation with ancestry-aware context
|
||||
variant-annotation — Variant annotation pipeline
|
||||
|
||||
### Differential Expression (Bulk RNA-seq)
|
||||
bioSkills:
|
||||
differential-expression/ — deseq2-basics, edger-basics, batch-correction, de-results, de-visualization, timeseries-de
|
||||
rna-quantification/ — alignment-free-quant (Salmon/kallisto), featurecounts-counting, tximport-workflow, count-matrix-qc
|
||||
expression-matrix/ — counts-ingest, gene-id-mapping, metadata-joins, sparse-handling
|
||||
ClawBio:
|
||||
rnaseq-de — Full DE pipeline with QC, normalization, and visualization
|
||||
diff-visualizer — Rich visualization and reporting for DE results
|
||||
|
||||
### Single-Cell RNA-seq
|
||||
bioSkills:
|
||||
single-cell/ — preprocessing, clustering, batch-integration, cell-annotation, cell-communication, doublet-detection, markers-annotation, trajectory-inference, multimodal-integration, perturb-seq, scatac-analysis, lineage-tracing, metabolite-communication, data-io
|
||||
ClawBio:
|
||||
scrna-orchestrator — Full Scanpy pipeline (QC, clustering, markers, annotation)
|
||||
scrna-embedding — scVI-based latent embedding and batch integration
|
||||
|
||||
### Spatial Transcriptomics
|
||||
bioSkills:
|
||||
spatial-transcriptomics/ — spatial-data-io, spatial-preprocessing, spatial-domains, spatial-deconvolution, spatial-communication, spatial-neighbors, spatial-statistics, spatial-visualization, spatial-multiomics, spatial-proteomics, image-analysis
|
||||
|
||||
### Epigenomics
|
||||
bioSkills:
|
||||
chip-seq/ — peak-calling, differential-binding, motif-analysis, peak-annotation, chipseq-qc, chipseq-visualization, super-enhancers
|
||||
atac-seq/ — atac-peak-calling, atac-qc, differential-accessibility, footprinting, motif-deviation, nucleosome-positioning
|
||||
methylation-analysis/ — bismark-alignment, methylation-calling, dmr-detection, methylkit-analysis
|
||||
hi-c-analysis/ — hic-data-io, tad-detection, loop-calling, compartment-analysis, contact-pairs, matrix-operations, hic-visualization, hic-differential
|
||||
ClawBio:
|
||||
methylation-clock — Epigenetic age estimation
|
||||
|
||||
### Pharmacogenomics & Clinical
|
||||
bioSkills:
|
||||
clinical-databases/ — clinvar-lookup, gnomad-frequencies, dbsnp-queries, pharmacogenomics, polygenic-risk, hla-typing, variant-prioritization, somatic-signatures, tumor-mutational-burden, myvariant-queries
|
||||
ClawBio:
|
||||
pharmgx-reporter — PGx report from 23andMe/AncestryDNA (12 genes, 31 SNPs, 51 drugs)
|
||||
drug-photo — Photo of medication → personalized PGx dosage card (via vision)
|
||||
clinpgx — ClinPGx API for gene-drug data and CPIC guidelines
|
||||
gwas-lookup — Federated variant lookup across 9 genomic databases
|
||||
gwas-prs — Polygenic risk scores from consumer genetic data
|
||||
nutrigx_advisor — Personalized nutrition from consumer genetic data
|
||||
|
||||
### Population Genetics & GWAS
|
||||
bioSkills:
|
||||
population-genetics/ — association-testing (PLINK GWAS), plink-basics, population-structure, linkage-disequilibrium, scikit-allel-analysis, selection-statistics
|
||||
causal-genomics/ — mendelian-randomization, fine-mapping, colocalization-analysis, mediation-analysis, pleiotropy-detection
|
||||
phasing-imputation/ — haplotype-phasing, genotype-imputation, imputation-qc, reference-panels
|
||||
ClawBio:
|
||||
claw-ancestry-pca — Ancestry PCA against SGDP reference panel
|
||||
|
||||
### Metagenomics & Microbiome
|
||||
bioSkills:
|
||||
metagenomics/ — kraken-classification, metaphlan-profiling, abundance-estimation, functional-profiling, amr-detection, strain-tracking, metagenome-visualization
|
||||
microbiome/ — amplicon-processing, diversity-analysis, differential-abundance, taxonomy-assignment, functional-prediction, qiime2-workflow
|
||||
ClawBio:
|
||||
claw-metagenomics — Shotgun metagenomics profiling (taxonomy, resistome, functional pathways)
|
||||
|
||||
### Genome Assembly & Annotation
|
||||
bioSkills:
|
||||
genome-assembly/ — hifi-assembly, long-read-assembly, short-read-assembly, metagenome-assembly, assembly-polishing, assembly-qc, scaffolding, contamination-detection
|
||||
genome-annotation/ — eukaryotic-gene-prediction, prokaryotic-annotation, functional-annotation, ncrna-annotation, repeat-annotation, annotation-transfer
|
||||
long-read-sequencing/ — basecalling, long-read-alignment, long-read-qc, clair3-variants, structural-variants, medaka-polishing, nanopore-methylation, isoseq-analysis
|
||||
|
||||
### Structural Biology & Chemoinformatics
|
||||
bioSkills:
|
||||
structural-biology/ — alphafold-predictions, modern-structure-prediction, structure-io, structure-navigation, structure-modification, geometric-analysis
|
||||
chemoinformatics/ — molecular-io, molecular-descriptors, similarity-searching, substructure-search, virtual-screening, admet-prediction, reaction-enumeration
|
||||
ClawBio:
|
||||
struct-predictor — Local AlphaFold/Boltz/Chai structure prediction with comparison
|
||||
|
||||
### Proteomics
|
||||
bioSkills:
|
||||
proteomics/ — data-import, peptide-identification, protein-inference, quantification, differential-abundance, dia-analysis, ptm-analysis, proteomics-qc, spectral-libraries
|
||||
ClawBio:
|
||||
proteomics-de — Proteomics differential expression
|
||||
|
||||
### Pathway Analysis & Gene Networks
|
||||
bioSkills:
|
||||
pathway-analysis/ — go-enrichment, gsea, kegg-pathways, reactome-pathways, wikipathways, enrichment-visualization
|
||||
gene-regulatory-networks/ — scenic-regulons, coexpression-networks, differential-networks, multiomics-grn, perturbation-simulation
|
||||
|
||||
### Immunoinformatics
|
||||
bioSkills:
|
||||
immunoinformatics/ — mhc-binding-prediction, epitope-prediction, neoantigen-prediction, immunogenicity-scoring, tcr-epitope-binding
|
||||
tcr-bcr-analysis/ — mixcr-analysis, scirpy-analysis, immcantation-analysis, repertoire-visualization, vdjtools-analysis
|
||||
|
||||
### CRISPR & Genome Engineering
|
||||
bioSkills:
|
||||
crispr-screens/ — mageck-analysis, jacks-analysis, hit-calling, screen-qc, library-design, crispresso-editing, base-editing-analysis, batch-correction
|
||||
genome-engineering/ — grna-design, off-target-prediction, hdr-template-design, base-editing-design, prime-editing-design
|
||||
|
||||
### Workflow Management
|
||||
bioSkills:
|
||||
workflow-management/ — snakemake-workflows, nextflow-pipelines, cwl-workflows, wdl-workflows
|
||||
ClawBio:
|
||||
repro-enforcer — Export any analysis as reproducibility bundle (Conda env + Singularity + checksums)
|
||||
galaxy-bridge — Access 8,000+ Galaxy tools from usegalaxy.org
|
||||
|
||||
### Specialized Domains
|
||||
bioSkills:
|
||||
alternative-splicing/ — splicing-quantification, differential-splicing, isoform-switching, sashimi-plots, single-cell-splicing, splicing-qc
|
||||
ecological-genomics/ — edna-metabarcoding, landscape-genomics, conservation-genetics, biodiversity-metrics, community-ecology, species-delimitation
|
||||
epidemiological-genomics/ — pathogen-typing, variant-surveillance, phylodynamics, transmission-inference, amr-surveillance
|
||||
liquid-biopsy/ — cfdna-preprocessing, ctdna-mutation-detection, fragment-analysis, tumor-fraction-estimation, methylation-based-detection, longitudinal-monitoring
|
||||
epitranscriptomics/ — m6a-peak-calling, m6a-differential, m6anet-analysis, merip-preprocessing, modification-visualization
|
||||
metabolomics/ — xcms-preprocessing, metabolite-annotation, normalization-qc, statistical-analysis, pathway-mapping, lipidomics, targeted-analysis, msdial-preprocessing
|
||||
flow-cytometry/ — fcs-handling, gating-analysis, compensation-transformation, clustering-phenotyping, differential-analysis, cytometry-qc, doublet-detection, bead-normalization
|
||||
systems-biology/ — flux-balance-analysis, metabolic-reconstruction, gene-essentiality, context-specific-models, model-curation
|
||||
rna-structure/ — secondary-structure-prediction, ncrna-search, structure-probing
|
||||
|
||||
### Data Visualization & Reporting
|
||||
bioSkills:
|
||||
data-visualization/ — ggplot2-fundamentals, heatmaps-clustering, volcano-customization, circos-plots, genome-browser-tracks, interactive-visualization, multipanel-figures, network-visualization, upset-plots, color-palettes, specialized-omics-plots, genome-tracks
|
||||
reporting/ — rmarkdown-reports, quarto-reports, jupyter-reports, automated-qc-reports, figure-export
|
||||
ClawBio:
|
||||
profile-report — Analysis profile reporting
|
||||
data-extractor — Extract numerical data from scientific figure images (via vision)
|
||||
lit-synthesizer — PubMed/bioRxiv search, summarization, citation graphs
|
||||
pubmed-summariser — Gene/disease PubMed search with structured briefing
|
||||
|
||||
### Database Access
|
||||
bioSkills:
|
||||
database-access/ — entrez-search, entrez-fetch, entrez-link, blast-searches, local-blast, sra-data, geo-data, uniprot-access, batch-downloads, interaction-databases, sequence-similarity
|
||||
ClawBio:
|
||||
ukb-navigator — Semantic search across 12,000+ UK Biobank fields
|
||||
clinical-trial-finder — Clinical trial discovery
|
||||
|
||||
### Experimental Design
|
||||
bioSkills:
|
||||
experimental-design/ — power-analysis, sample-size, batch-design, multiple-testing
|
||||
|
||||
### Machine Learning for Omics
|
||||
bioSkills:
|
||||
machine-learning/ — omics-classifiers, biomarker-discovery, survival-analysis, model-validation, prediction-explanation, atlas-mapping
|
||||
ClawBio:
|
||||
claw-semantic-sim — Semantic similarity index for disease literature (PubMedBERT)
|
||||
omics-target-evidence-mapper — Aggregate target-level evidence across omics sources
|
||||
|
||||
## Environment Setup
|
||||
|
||||
These skills assume a bioinformatics workstation. Common dependencies:
|
||||
|
||||
```bash
|
||||
# Python
|
||||
pip install biopython pysam cyvcf2 pybedtools pyBigWig scikit-allel anndata scanpy mygene
|
||||
|
||||
# R/Bioconductor
|
||||
Rscript -e 'BiocManager::install(c("DESeq2","edgeR","Seurat","clusterProfiler","methylKit"))'
|
||||
|
||||
# CLI tools (Ubuntu/Debian)
|
||||
sudo apt install samtools bcftools ncbi-blast+ minimap2 bedtools
|
||||
|
||||
# CLI tools (macOS)
|
||||
brew install samtools bcftools blast minimap2 bedtools
|
||||
|
||||
# Or via Conda (recommended for reproducibility)
|
||||
conda install -c bioconda samtools bcftools blast minimap2 bedtools fastp kraken2
|
||||
```
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- The fetched skills are NOT in Hermes SKILL.md format. They use their own structure (bioSkills: code pattern cookbooks; ClawBio: README + Python scripts). Read them as expert reference material.
|
||||
- bioSkills are reference guides — they show correct parameters and code patterns but aren't executable pipelines.
|
||||
- ClawBio skills are executable — many have `--demo` flags and can be run directly.
|
||||
- Both repos assume bioinformatics tools are installed. Check prerequisites before running pipelines.
|
||||
- For ClawBio, run `pip install -r requirements.txt` in the cloned repo first.
|
||||
- Genomic data files can be very large. Be mindful of disk space when downloading reference genomes, SRA datasets, or building indices.
|
||||
441
hermes_code/optional-skills/research/qmd/SKILL.md
Normal file
441
hermes_code/optional-skills/research/qmd/SKILL.md
Normal file
|
|
@ -0,0 +1,441 @@
|
|||
---
|
||||
name: qmd
|
||||
description: Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration.
|
||||
version: 1.0.0
|
||||
author: Hermes Agent + Teknium
|
||||
license: MIT
|
||||
platforms: [macos, linux]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [Search, Knowledge-Base, RAG, Notes, MCP, Local-AI]
|
||||
related_skills: [obsidian, native-mcp, arxiv]
|
||||
---
|
||||
|
||||
# QMD — Query Markup Documents
|
||||
|
||||
Local, on-device search engine for personal knowledge bases. Indexes markdown
|
||||
notes, meeting transcripts, documentation, and any text-based files, then
|
||||
provides hybrid search combining keyword matching, semantic understanding, and
|
||||
LLM-powered reranking — all running locally with no cloud dependencies.
|
||||
|
||||
Created by [Tobi Lütke](https://github.com/tobi/qmd). MIT licensed.
|
||||
|
||||
## When to Use
|
||||
|
||||
- User asks to search their notes, docs, knowledge base, or meeting transcripts
|
||||
- User wants to find something across a large collection of markdown/text files
|
||||
- User wants semantic search ("find notes about X concept") not just keyword grep
|
||||
- User has already set up qmd collections and wants to query them
|
||||
- User asks to set up a local knowledge base or document search system
|
||||
- Keywords: "search my notes", "find in my docs", "knowledge base", "qmd"
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Node.js >= 22 (required)
|
||||
|
||||
```bash
|
||||
# Check version
|
||||
node --version # must be >= 22
|
||||
|
||||
# macOS — install or upgrade via Homebrew
|
||||
brew install node@22
|
||||
|
||||
# Linux — use NodeSource or nvm
|
||||
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
|
||||
sudo apt-get install -y nodejs
|
||||
# or with nvm:
|
||||
nvm install 22 && nvm use 22
|
||||
```
|
||||
|
||||
### SQLite with Extension Support (macOS only)
|
||||
|
||||
macOS system SQLite lacks extension loading. Install via Homebrew:
|
||||
|
||||
```bash
|
||||
brew install sqlite
|
||||
```
|
||||
|
||||
### Install qmd
|
||||
|
||||
```bash
|
||||
npm install -g @tobilu/qmd
|
||||
# or with Bun:
|
||||
bun install -g @tobilu/qmd
|
||||
```
|
||||
|
||||
First run auto-downloads 3 local GGUF models (~2GB total):
|
||||
|
||||
| Model | Purpose | Size |
|
||||
|-------|---------|------|
|
||||
| embeddinggemma-300M-Q8_0 | Vector embeddings | ~300MB |
|
||||
| qwen3-reranker-0.6b-q8_0 | Result reranking | ~640MB |
|
||||
| qmd-query-expansion-1.7B | Query expansion | ~1.1GB |
|
||||
|
||||
### Verify Installation
|
||||
|
||||
```bash
|
||||
qmd --version
|
||||
qmd status
|
||||
```
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Command | What It Does | Speed |
|
||||
|---------|-------------|-------|
|
||||
| `qmd search "query"` | BM25 keyword search (no models) | ~0.2s |
|
||||
| `qmd vsearch "query"` | Semantic vector search (1 model) | ~3s |
|
||||
| `qmd query "query"` | Hybrid + reranking (all 3 models) | ~2-3s warm, ~19s cold |
|
||||
| `qmd get <docid>` | Retrieve full document content | instant |
|
||||
| `qmd multi-get "glob"` | Retrieve multiple files | instant |
|
||||
| `qmd collection add <path> --name <n>` | Add a directory as a collection | instant |
|
||||
| `qmd context add <path> "description"` | Add context metadata to improve retrieval | instant |
|
||||
| `qmd embed` | Generate/update vector embeddings | varies |
|
||||
| `qmd status` | Show index health and collection info | instant |
|
||||
| `qmd mcp` | Start MCP server (stdio) | persistent |
|
||||
| `qmd mcp --http --daemon` | Start MCP server (HTTP, warm models) | persistent |
|
||||
|
||||
## Setup Workflow
|
||||
|
||||
### 1. Add Collections
|
||||
|
||||
Point qmd at directories containing your documents:
|
||||
|
||||
```bash
|
||||
# Add a notes directory
|
||||
qmd collection add ~/notes --name notes
|
||||
|
||||
# Add project docs
|
||||
qmd collection add ~/projects/myproject/docs --name project-docs
|
||||
|
||||
# Add meeting transcripts
|
||||
qmd collection add ~/meetings --name meetings
|
||||
|
||||
# List all collections
|
||||
qmd collection list
|
||||
```
|
||||
|
||||
### 2. Add Context Descriptions
|
||||
|
||||
Context metadata helps the search engine understand what each collection
|
||||
contains. This significantly improves retrieval quality:
|
||||
|
||||
```bash
|
||||
qmd context add qmd://notes "Personal notes, ideas, and journal entries"
|
||||
qmd context add qmd://project-docs "Technical documentation for the main project"
|
||||
qmd context add qmd://meetings "Meeting transcripts and action items from team syncs"
|
||||
```
|
||||
|
||||
### 3. Generate Embeddings
|
||||
|
||||
```bash
|
||||
qmd embed
|
||||
```
|
||||
|
||||
This processes all documents in all collections and generates vector
|
||||
embeddings. Re-run after adding new documents or collections.
|
||||
|
||||
### 4. Verify
|
||||
|
||||
```bash
|
||||
qmd status # shows index health, collection stats, model info
|
||||
```
|
||||
|
||||
## Search Patterns
|
||||
|
||||
### Fast Keyword Search (BM25)
|
||||
|
||||
Best for: exact terms, code identifiers, names, known phrases.
|
||||
No models loaded — near-instant results.
|
||||
|
||||
```bash
|
||||
qmd search "authentication middleware"
|
||||
qmd search "handleError async"
|
||||
```
|
||||
|
||||
### Semantic Vector Search
|
||||
|
||||
Best for: natural language questions, conceptual queries.
|
||||
Loads embedding model (~3s first query).
|
||||
|
||||
```bash
|
||||
qmd vsearch "how does the rate limiter handle burst traffic"
|
||||
qmd vsearch "ideas for improving onboarding flow"
|
||||
```
|
||||
|
||||
### Hybrid Search with Reranking (Best Quality)
|
||||
|
||||
Best for: important queries where quality matters most.
|
||||
Uses all 3 models — query expansion, parallel BM25+vector, reranking.
|
||||
|
||||
```bash
|
||||
qmd query "what decisions were made about the database migration"
|
||||
```
|
||||
|
||||
### Structured Multi-Mode Queries
|
||||
|
||||
Combine different search types in a single query for precision:
|
||||
|
||||
```bash
|
||||
# BM25 for exact term + vector for concept
|
||||
qmd query $'lex: rate limiter\nvec: how does throttling work under load'
|
||||
|
||||
# With query expansion
|
||||
qmd query $'expand: database migration plan\nlex: "schema change"'
|
||||
```
|
||||
|
||||
### Query Syntax (lex/BM25 mode)
|
||||
|
||||
| Syntax | Effect | Example |
|
||||
|--------|--------|---------|
|
||||
| `term` | Prefix match | `perf` matches "performance" |
|
||||
| `"phrase"` | Exact phrase | `"rate limiter"` |
|
||||
| `-term` | Exclude term | `performance -sports` |
|
||||
|
||||
### HyDE (Hypothetical Document Embeddings)
|
||||
|
||||
For complex topics, write what you expect the answer to look like:
|
||||
|
||||
```bash
|
||||
qmd query $'hyde: The migration plan involves three phases. First, we add the new columns without dropping the old ones. Then we backfill data. Finally we cut over and remove legacy columns.'
|
||||
```
|
||||
|
||||
### Scoping to Collections
|
||||
|
||||
```bash
|
||||
qmd search "query" --collection notes
|
||||
qmd query "query" --collection project-docs
|
||||
```
|
||||
|
||||
### Output Formats
|
||||
|
||||
```bash
|
||||
qmd search "query" --json # JSON output (best for parsing)
|
||||
qmd search "query" --limit 5 # Limit results
|
||||
qmd get "#abc123" # Get by document ID
|
||||
qmd get "path/to/file.md" # Get by file path
|
||||
qmd get "file.md:50" -l 100 # Get specific line range
|
||||
qmd multi-get "journals/*.md" --json # Batch retrieve by glob
|
||||
```
|
||||
|
||||
## MCP Integration (Recommended)
|
||||
|
||||
qmd exposes an MCP server that provides search tools directly to
|
||||
Hermes Agent via the native MCP client. This is the preferred
|
||||
integration — once configured, the agent gets qmd tools automatically
|
||||
without needing to load this skill.
|
||||
|
||||
### Option A: Stdio Mode (Simple)
|
||||
|
||||
Add to `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
qmd:
|
||||
command: "qmd"
|
||||
args: ["mcp"]
|
||||
timeout: 30
|
||||
connect_timeout: 45
|
||||
```
|
||||
|
||||
This registers tools: `mcp_qmd_search`, `mcp_qmd_vsearch`,
|
||||
`mcp_qmd_deep_search`, `mcp_qmd_get`, `mcp_qmd_status`.
|
||||
|
||||
**Tradeoff:** Models load on first search call (~19s cold start),
|
||||
then stay warm for the session. Acceptable for occasional use.
|
||||
|
||||
### Option B: HTTP Daemon Mode (Fast, Recommended for Heavy Use)
|
||||
|
||||
Start the qmd daemon separately — it keeps models warm in memory:
|
||||
|
||||
```bash
|
||||
# Start daemon (persists across agent restarts)
|
||||
qmd mcp --http --daemon
|
||||
|
||||
# Runs on http://localhost:8181 by default
|
||||
```
|
||||
|
||||
Then configure Hermes Agent to connect via HTTP:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
qmd:
|
||||
url: "http://localhost:8181/mcp"
|
||||
timeout: 30
|
||||
```
|
||||
|
||||
**Tradeoff:** Uses ~2GB RAM while running, but every query is fast
|
||||
(~2-3s). Best for users who search frequently.
|
||||
|
||||
### Keeping the Daemon Running
|
||||
|
||||
#### macOS (launchd)
|
||||
|
||||
```bash
|
||||
cat > ~/Library/LaunchAgents/com.qmd.daemon.plist << 'EOF'
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
|
||||
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.qmd.daemon</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>qmd</string>
|
||||
<string>mcp</string>
|
||||
<string>--http</string>
|
||||
<string>--daemon</string>
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
<key>StandardOutPath</key>
|
||||
<string>/tmp/qmd-daemon.log</string>
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/tmp/qmd-daemon.log</string>
|
||||
</dict>
|
||||
</plist>
|
||||
EOF
|
||||
|
||||
launchctl load ~/Library/LaunchAgents/com.qmd.daemon.plist
|
||||
```
|
||||
|
||||
#### Linux (systemd user service)
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.config/systemd/user
|
||||
|
||||
cat > ~/.config/systemd/user/qmd-daemon.service << 'EOF'
|
||||
[Unit]
|
||||
Description=QMD MCP Daemon
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
ExecStart=qmd mcp --http --daemon
|
||||
Restart=on-failure
|
||||
RestartSec=10
|
||||
Environment=PATH=/usr/local/bin:/usr/bin:/bin
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
EOF
|
||||
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable --now qmd-daemon
|
||||
systemctl --user status qmd-daemon
|
||||
```
|
||||
|
||||
### MCP Tools Reference
|
||||
|
||||
Once connected, these tools are available as `mcp_qmd_*`:
|
||||
|
||||
| MCP Tool | Maps To | Description |
|
||||
|----------|---------|-------------|
|
||||
| `mcp_qmd_search` | `qmd search` | BM25 keyword search |
|
||||
| `mcp_qmd_vsearch` | `qmd vsearch` | Semantic vector search |
|
||||
| `mcp_qmd_deep_search` | `qmd query` | Hybrid search + reranking |
|
||||
| `mcp_qmd_get` | `qmd get` | Retrieve document by ID or path |
|
||||
| `mcp_qmd_status` | `qmd status` | Index health and stats |
|
||||
|
||||
The MCP tools accept structured JSON queries for multi-mode search:
|
||||
|
||||
```json
|
||||
{
|
||||
"searches": [
|
||||
{"type": "lex", "query": "authentication middleware"},
|
||||
{"type": "vec", "query": "how user login is verified"}
|
||||
],
|
||||
"collections": ["project-docs"],
|
||||
"limit": 10
|
||||
}
|
||||
```
|
||||
|
||||
## CLI Usage (Without MCP)
|
||||
|
||||
When MCP is not configured, use qmd directly via terminal:
|
||||
|
||||
```
|
||||
terminal(command="qmd query 'what was decided about the API redesign' --json", timeout=30)
|
||||
```
|
||||
|
||||
For setup and management tasks, always use terminal:
|
||||
|
||||
```
|
||||
terminal(command="qmd collection add ~/Documents/notes --name notes")
|
||||
terminal(command="qmd context add qmd://notes 'Personal research notes and ideas'")
|
||||
terminal(command="qmd embed")
|
||||
terminal(command="qmd status")
|
||||
```
|
||||
|
||||
## How the Search Pipeline Works
|
||||
|
||||
Understanding the internals helps choose the right search mode:
|
||||
|
||||
1. **Query Expansion** — A fine-tuned 1.7B model generates 2 alternative
|
||||
queries. The original gets 2x weight in fusion.
|
||||
2. **Parallel Retrieval** — BM25 (SQLite FTS5) and vector search run
|
||||
simultaneously across all query variants.
|
||||
3. **RRF Fusion** — Reciprocal Rank Fusion (k=60) merges results.
|
||||
Top-rank bonus: #1 gets +0.05, #2-3 get +0.02.
|
||||
4. **LLM Reranking** — qwen3-reranker scores top 30 candidates (0.0-1.0).
|
||||
5. **Position-Aware Blending** — Ranks 1-3: 75% retrieval / 25% reranker.
|
||||
Ranks 4-10: 60/40. Ranks 11+: 40/60 (trusts reranker more for long tail).
|
||||
|
||||
**Smart Chunking:** Documents are split at natural break points (headings,
|
||||
code blocks, blank lines) targeting ~900 tokens with 15% overlap. Code
|
||||
blocks are never split mid-block.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always add context descriptions** — `qmd context add` dramatically
|
||||
improves retrieval accuracy. Describe what each collection contains.
|
||||
2. **Re-embed after adding documents** — `qmd embed` must be re-run when
|
||||
new files are added to collections.
|
||||
3. **Use `qmd search` for speed** — when you need fast keyword lookup
|
||||
(code identifiers, exact names), BM25 is instant and needs no models.
|
||||
4. **Use `qmd query` for quality** — when the question is conceptual or
|
||||
the user needs the best possible results, use hybrid search.
|
||||
5. **Prefer MCP integration** — once configured, the agent gets native
|
||||
tools without needing to load this skill each time.
|
||||
6. **Daemon mode for frequent users** — if the user searches their
|
||||
knowledge base regularly, recommend the HTTP daemon setup.
|
||||
7. **First query in structured search gets 2x weight** — put the most
|
||||
important/certain query first when combining lex and vec.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Models downloading on first run"
|
||||
Normal — qmd auto-downloads ~2GB of GGUF models on first use.
|
||||
This is a one-time operation.
|
||||
|
||||
### Cold start latency (~19s)
|
||||
This happens when models aren't loaded in memory. Solutions:
|
||||
- Use HTTP daemon mode (`qmd mcp --http --daemon`) to keep warm
|
||||
- Use `qmd search` (BM25 only) when models aren't needed
|
||||
- MCP stdio mode loads models on first search, stays warm for session
|
||||
|
||||
### macOS: "unable to load extension"
|
||||
Install Homebrew SQLite: `brew install sqlite`
|
||||
Then ensure it's on PATH before system SQLite.
|
||||
|
||||
### "No collections found"
|
||||
Run `qmd collection add <path> --name <name>` to add directories,
|
||||
then `qmd embed` to index them.
|
||||
|
||||
### Embedding model override (CJK/multilingual)
|
||||
Set `QMD_EMBED_MODEL` environment variable for non-English content:
|
||||
```bash
|
||||
export QMD_EMBED_MODEL="your-multilingual-model"
|
||||
```
|
||||
|
||||
## Data Storage
|
||||
|
||||
- **Index & vectors:** `~/.cache/qmd/index.sqlite`
|
||||
- **Models:** Auto-downloaded to local cache on first run
|
||||
- **No cloud dependencies** — everything runs locally
|
||||
|
||||
## References
|
||||
|
||||
- [GitHub: tobi/qmd](https://github.com/tobi/qmd)
|
||||
- [QMD Changelog](https://github.com/tobi/qmd/blob/main/CHANGELOG.md)
|
||||
Loading…
Add table
Add a link
Reference in a new issue