2026-04-06 22:12:48 +03:00

7.7 KiB

Raw Blame History

name	description
story-gen	Generate a structured video scenario (JSON) from any input: product description, idea, joke, educational topic, or URL. Adapts to platform (TikTok, WB, YouTube, Instagram, VK), audience, and content restrictions. Returns scenes with detailed visual prompts for image/video generation, voiceover text, captions, and timing. Use when: user wants to create any video — ad, viral reel, educational, postcard, long-form (2 min), or product showcase.

name

description

story-gen

Generate a structured video scenario (JSON) from any input: product description, idea, joke, educational topic, or URL. Adapts to platform (TikTok, WB, YouTube, Instagram, VK), audience, and content restrictions. Returns scenes with detailed visual prompts for image/video generation, voiceover text, captions, and timing. Use when: user wants to create any video — ad, viral reel, educational, postcard, long-form (2 min), or product showcase.

Story Gen

Universal video scenario generator. Works for any content type and platform.

Language rules

This skill and all its documentation is written in English only
Input can be in any language — Russian, English, Chinese, etc.
visual_prompt is always in English (required by gpt-image-1.5 and veo-3.1)
voiceover and caption match the input language or --lang parameter
If --lang auto (default): language is detected automatically from input

When to use

User wants to make a video for Wildberries, TikTok, Instagram, YouTube, VK
User has a product, idea, joke, or topic and wants a ready script
Pipeline needs structured JSON with visual prompts + voiceover for next steps
User provides assets (photos, URLs) that need analysis before scripting

Setup

Needs env:

OPENAI_API_KEY — API key
OPENAI_BASE_URL — endpoint (default: https://llm.lambda.coredump.ru/v1)
STORY_MODEL — model (default: qwen3.5-122b)

Two modes

`--mode image` (default)

Generates a storyboard scenario for image/visual generation. Each scene has a visual_prompt (English) ready for gpt-image-1.5 or veo-3.1.

For trend-photo asks such as anime portrait, studio headshot, USSR postcard, photo booth, aged self, flowers in hair, and the other curated portrait trends stored under scripts/trends/, generate.py can also emit a single-image scenario that downstream image generation can consume directly.

`--mode video`

Generates a full shooting script for real video production. Each scene has:

timecode — cumulative start time HH:MM:SS
voiceover — exact words spoken by narrator (in target language)
action — what happens on screen in English (for director / video generation)

Automatically saves two files when --out is given:

scenario.json — full structured script
scenario_voiceover.txt — ready for voice/voice_acting.py in [HH:MM:SS] text format

Parameters

Parameter	Values	Description
`--mode`	`image`, `video`	`image`: visual storyboard; `video`: full shooting script with voiceover
`--format`	`wb_ad`, `reels`, `viral`, `long`, `postcard`, `educational`, `trend_photo`, `auto`	Video format (image mode only)
`--platform`	`tiktok`, `instagram`, `wb`, `youtube`, `vk`, `auto`	Target platform
`--audience`	any text	Target audience description
`--duration`	seconds	Target duration
`--lang`	`ru`, `en`, `de`, `auto`	Language for voiceover and captions
`--photo`	filepath	Reference photo path for trend-photo scenarios
`--analyze`	flag	Analyze assets before generating (image mode only)
`--out`	filepath	Save JSON to file (video mode also saves `_voiceover.txt`)
`--voice`	flag	After script generation, immediately run voice synthesis (video mode + `--out` required)
`--voice-out`	dirpath	Directory for voice segments (default: `voice_segments/` next to `--out`)

Usage examples

# WB product ad — image storyboard (default mode)
python3 {baseDir}/scripts/generate.py \
  "Женская сумка из экокожи, бежевая, 2500 руб" \
  --format wb_ad --platform wb

# Full video shooting script + automatically run voice synthesis
python3 {baseDir}/scripts/generate.py \
  "Обзор беговых кроссовок Nike для TikTok" \
  --mode video --platform tiktok --duration 60 --lang ru \
  --out assets/scenario.json --voice
# → saves assets/scenario.json
# → saves assets/scenario_voiceover.txt
# → runs voice_acting.py → saves wav segments to assets/voice_segments/
# → saves assets/voice_segments/segments.txt (manifest for combine_audio.sh)

# Without auto voice (manual step later):
python3 {baseDir}/scripts/generate.py \
  "Обзор беговых кроссовок Nike для TikTok" \
  --mode video --platform tiktok --duration 60 --lang ru \
  --out assets/scenario.json
# Then manually:
python3 voice/voice_acting.py assets/scenario_voiceover.txt -o assets/voice_segments

# Viral TikTok image storyboard (English voiceover)
python3 {baseDir}/scripts/generate.py \
  "Анекдот про программиста и кофе" \
  --format viral --platform tiktok --lang en

# Curated trend-photo scenario for downstream image generation
python3 {baseDir}/scripts/generate.py \
  "Сделай меня в стиле аниме" \
  --format trend_photo --photo assets/me.jpg \
  --out assets/trend-scenario.json
# Then hand the JSON to image-generation:
# python3 image-generation/scripts/generate-image.py --scenario assets/trend-scenario.json

# Long educational video shooting script
python3 {baseDir}/scripts/generate.py \
  "How to choose your first bicycle" \
  --mode video --platform youtube --duration 120 --lang en \
  --out assets/bicycle_scenario.json

Output JSON — image mode

{
  "title": "video title",
  "format": "wb_ad|reels|viral|long|postcard|educational|trend_photo",
  "platform": "tiktok|instagram|wb|youtube|vk",
  "language": "ru|en|...",
  "duration_sec": 30,
  "hook": "first 3 seconds — grabbing phrase or action",
  "target_audience": "who watches this",
  "content_restrictions": "platform rules (aspect ratio, age restrictions, etc.)",
  "scenes": [
    {
      "id": 1,
      "duration_sec": 5,
      "visual_prompt": "ALWAYS IN ENGLISH — detailed prompt for gpt-image-1.5 or veo-3.1",
      "visual_type": "image|video_clip|text_only",
      "voiceover": "narration text in target language",
      "caption": "on-screen text in target language"
    }
  ],
  "image_request": {
    "prompt": "single image prompt for downstream image-generation",
    "reference_image_required": true,
    "reference_image_path": "/abs/path/to/photo.jpg"
  },
  "storyboard_grid_prompt": "NxN storyboard grid — all scenes as one image. null if no recurring subject.",
  "music_mood": "upbeat|calm|dramatic|funny|inspirational",
  "style_notes": "overall style and delivery notes",
  "asset_analysis": null
}

Output JSON — video mode

{
  "title": "video title",
  "platform": "tiktok|instagram|wb|youtube|vk",
  "language": "ru|en|...",
  "duration_sec": 60,
  "hook": "first 3 seconds — what grabs attention",
  "target_audience": "who watches this",
  "scenes": [
    {
      "id": 1,
      "timecode": "00:00:00",
      "duration_sec": 5,
      "voiceover": "exact words spoken by narrator in target language",
      "action": "detailed English description of what is on screen: camera, subject, movement, lighting"
    }
  ],
  "music_mood": "upbeat|calm|dramatic|funny|inspirational",
  "style_notes": "overall visual style, pacing, tone"
}

Pipeline integration

Image mode output feeds into:

visual_prompt → image generation (gpt-image-1.5) or video (veo-3.1)
image_request.prompt + reference_image_path → image-generation/scripts/generate-image.py for trend-photo edits
voiceover → TTS (Pocket-TTS or ElevenLabs)
caption + duration_sec → ffmpeg montage (../ffmpeg-editing/SKILL.md)
Full JSON → orchestrator (../SKILL.md)

Video mode output feeds into:

_voiceover.txt → voice/voice_acting.py for speech synthesis
action per scene → video generation or director instructions
Full JSON → orchestrator (../SKILL.md)

7.7 KiB Raw Blame History Unescape Escape