11 KiB
| name | description |
|---|---|
| ffmpeg-editing | Plan and execute deterministic audio and video edits with ffmpeg and ffprobe. Use when an AI agent needs to cut clips, concatenate videos, reorder segments, replace or mix audio, burn or mux captions, add text or image overlays, reframe footage for vertical or square formats such as 9:16, add transitions between clips, change speed, extract frames, normalize exports, or translate a plain-English editing request into concrete ffmpeg commands or scripts. |
FFmpeg Editing
Overview
Inspect the media first, then choose the simplest edit path that satisfies the request with the least quality loss.
Prefer stream copy for pure trims, remuxes, and compatible concatenation. Re-encode when the request involves filters, frame-accurate cuts, captions, overlays, speed changes, reframing, or mixed audio.
Quick Start
- Inspect every input with
ffprobe. - Normalize the request into an edit plan:
- inputs
- desired output
- exact time ranges
- whether timing must be frame-accurate
- whether subtitles are burned in or soft
- whether original audio must be preserved, replaced, or mixed
- Choose the edit family:
- trim/remux
- concat/reorder
- filter-based video edit
- filter-based audio edit
- subtitle or overlay pass
- Choose stream copy or re-encode deliberately.
- Build explicit
-maprules instead of relying on default stream selection. - For larger graphs, write a
-filter_complex_scriptfile instead of an unreadable inline filter string. - For MP4 outputs, usually add
-movflags +faststart.
Scripts
Prefer the bundled scripts in simple or repetitive cases before writing raw ffmpeg by hand:
scripts/trim-clip.sh: cut one file bystart/endorstart/+duration, withcopyoraccuratemode.scripts/merge-clips.sh: concatenate already-compatible clips after checking their stream signatures.scripts/make-vertical.sh: export a9:16version withcrop,pad, ordynamicmotion mode.scripts/render-meme-vertical.sh: build a meme-style vertical render with a blurred background plate, centered source clip, and wrapped top/bottom caption cards.scripts/replace-audio.sh: attach a new audio track to an existing video.scripts/mix-audio.sh: mix or duck background music under the original track.scripts/burn-captions.sh: burn.srt,.vtt, or.asscaptions into the picture.scripts/transition-two-clips.sh: build a normalized two-clip render withxfadeandacrossfade.
Use the scripts for the common path. Fall back to references/patterns.md when the request needs a custom graph or a multi-stage edit.
When the edit uses downloaded still images or transparent cutout PNGs from the heavy-assets phase, treat those files as normal overlay inputs and keep them in assets/ so the montage step can reference them mechanically.
When the heavy-assets phase prepared extra still images for the edit, treat them as first-class overlay inputs in the same way as local PNGs, logos, or cutouts.
Workflow
1. Inspect Inputs
Run ffprobe before writing commands. Capture:
- duration
- resolution
- frame rate
- pixel format
- video codec
- audio codec
- channel layout
- subtitle streams
- time base issues or variable frame rate
Use this information to decide whether stream copy is safe, whether concat demuxer can work, and whether a compatibility transcode is needed first.
2. Classify the Request
Map the user request to one of these patterns:
Cut a clip: trim one source into one output.Merge videos: concatenate compatible clips or use concat filter after normalizing them.Apply sound to video: replace audio, mix music under speech, or keep only one track.Apply captions: burn captions into video or mux subtitle streams.Make vertical: scale, crop, and optionally zoom for9:16.Add transitions: crossfade, fade-to-black, fade-to-white, wipe, or slide between adjacent clips.Add text/logo: usedrawtextoroverlay.Composite downloaded stills or cutouts: useoverlaywith the downloaded image or*.nobg.pngasset from the heavy-assets phase.Build a meme short: trim a punchline moment, reframe to9:16, then overlay strong top/bottom text without letting long lines run off the frame.Build a meme still: composite one source frame with downloaded overlay assets, then add strong top/bottom caption bars.Speed up / slow down: usesetptsandatempo.Build a short edit: trim multiple ranges, transform each segment, then concat.
If the source is a YouTube URL and the task is only to fetch one segment in this repository, prefer ../download-youtube-segment/scripts/download-clip.py before doing further editing.
3. Choose Copy vs Re-encode
Use stream copy when all of these are true:
- no filter is required
- approximate keyframe-aligned cutting is acceptable
- codecs and container are already acceptable
Re-encode when any of these are true:
- cut points must be exact
- subtitles must be burned in
- text, image, crop, scale, pad, blur, zoom, or transitions are needed
- audio must be mixed, ducked, faded, or normalized
- clips need normalization before concatenation
4. Build Commands Deliberately
Apply these rules:
- Use explicit
-mapvalues. - Set codecs intentionally instead of relying on defaults for production outputs.
- Use
libx264 -crf 18-23 -pix_fmt yuv420pfor broadly compatible H.264 delivery unless the user needs something else. - Use AAC for common MP4 audio delivery.
- Use
-shortestonly when you explicitly want the output to end at the shortest stream. - For accurate trims, prefer filter-based
trim/atrimor place-ssafter input with re-encode. - For fast rough trims, place
-ssbefore input and copy when acceptable.
5. Validate the Output
After rendering, inspect the output with ffprobe and verify:
- expected duration
- expected resolution and aspect ratio
- expected stream count
- audio is present and synchronized
- captions/overlays appear when expected
If the user asked for a reusable workflow, keep the command readable and parameterized.
Decision Rules
Trim One Clip
- Use copy trim for speed and no-generation-loss when keyframe accuracy is acceptable.
- Use re-encode trim for frame accuracy.
Concatenate Clips
- Use the concat demuxer when clips already match codec, time base, dimensions, and stream layout.
- Use the concat filter when clips differ or need per-clip transforms first.
- Use
xfadeandacrossfadewhen the user wants polished clip-to-clip transitions instead of hard cuts.
Replace or Mix Audio
- Replace audio by mapping the video from one input and audio from another.
- Mix audio with
amixorsidechaincompresswhen speech must stay clear over music. - Fade music in or out instead of hard starts and stops unless the user asked for abrupt edits.
Apply Captions
- Burn captions into the picture for platform-safe delivery or when the user wants styled subtitles.
- Mux subtitles as soft tracks when the user needs togglable captions.
- If no subtitle file or transcript exists, note that speech-to-text is a separate step.
Make Social Formats
- Use crop for intentional reframing.
- Use pad when preserving the full frame matters more than filling the canvas.
- Keep
9:16as a first-class output path for Shorts, Reels, and TikTok-style requests. - Add slight zoom or drift only when it supports the framing. Avoid constant motion on every clip.
- Keep output fps explicit when building short-form deliverables.
- For phone-first meme shorts, prefer a true fullscreen reframe from the source whenever the joke can survive cropping; do not default to a small horizontal clip floating inside a blurred background.
- Use the centered-foreground-over-background pattern as a fallback when preserving the whole horizontal frame matters more than screen occupancy or when cropping would destroy the beat.
- Different beats in the same meme may use different reframes. A setup can stay wider while the punchline snaps into a tighter fullscreen crop.
Add Text Or Meme Captions
- For short meme renders, prefer
scripts/render-meme-vertical.shover ad hocdrawtextwhen the output needs top/bottom reaction text. - Never assume the caption fits on one line. Wrap long phrases into a bounded caption box so the text stays inside the frame.
- Prefer overlaying rendered caption cards for multi-line meme text instead of building brittle single-line
drawtextexpressions. - Avoid thick harsh outlines by default. Prefer a thinner dark contour plus a soft shadow so the text stays readable without looking cheap.
- Keep meme captions centered and leave explicit margins from the top and bottom edges.
- If the output is a still meme rather than a video, it is acceptable to pre-render the caption bars/cards with ImageMagick using a real bold font and then composite them deterministically. Preserve the same wrapped-card look instead of dropping back to weak inline text.
- For fullscreen TikTok renders, size caption cards for actual phone readability rather than reusing small caption assets from the horizontal version.
- Re-seat glasses, masks, and other face overlays after every major reframe; coordinates that worked in a horizontal crop should be treated as invalid after a vertical fullscreen recut.
Composite Downloaded Stills
- Prefer placing externally downloaded stills, logos, or product photos into
assets/during the heavy-assets phase before you write the ffmpeg command. - If the still needs transparency, run remove-background/SKILL.md first and use the resulting PNG as the overlay input.
- Keep overlay asset names beat-oriented so the edit can reference them without reconstructing the scenario.
- For overlays prepared in the heavy-assets phase, prefer the downloaded and locally cleaned asset that best supports the beat.
- Position the overlay against the actual face/head landmarks in the chosen frame; do not leave the asset floating off-face just because the download step succeeded.
Add Transitions
- Prefer short transitions, usually
0.10-0.35seconds, for social edits unless the user wants a slower dramatic style. - Use plain cuts for beat-driven action when transitions would blur impact.
- Use
xfadefor video andacrossfadefor audio so the transition feels cohesive. - Normalize resolution, fps, sample rate, and channel layout before applying transitions.
- For larger edits, put the full graph in a filter script instead of building one long quoted command.
Reference Files
Read references/patterns.md for command skeletons covering the main editing patterns.