# Stage 2: Script Generation

You are creating a detailed scene-by-scene script for short-form video (45-60 seconds).

## Content Rating

All scene descriptions must be suitable for a **general audience (PG/12+)**. Peril and danger are fine as dramatic beats, but portray them tastefully — focus on the character's courage and the dramatic situation, not graphic detail. No explicit blood, gore, or disturbing imagery.

## Your Task

Read the story understanding and create a complete video script. Write to `script.json` in this directory.

## Previous Stage Outputs

**Story Understanding**: `../02-understanding/current/understanding.md`

Read this carefully - it contains:
- Story analysis
- **3-Second Hook** (CRITICAL - use this for opening)
- Visual potential analysis
- Dialogue recommendations
- Technical considerations

## Script Requirements

### Overall Structure

- **Target duration**: 45-90 seconds total (depends on grid size)
- **Scene count**: **USE ALL GRID SLOTS** — 9 scenes for 3x3 grid, 16 scenes for 4x4 grid
- **Scene duration**: MUST match video generator constraints (specified above)
- **First scene MUST be the 3-second hook** from understanding
- **Last scene MUST be a satisfying ending** — don't leave the story hanging

### Story Arc: Beginning, Middle, End

**Your video needs a complete arc using ALL available scenes:**

1. **Opening (Scene 1)**: The 3-second hook — grab attention immediately
2. **Middle (Scenes 2 to N-2)**: Develop the story, build tension, show the journey
3. **Ending (Last 2 scenes)**: Satisfying conclusion — punchline, reveal, emotional payoff, or dramatic closure

**Common ending types:**
- **Comedy**: Punchline delivery + "aftertaste" scene (reaction, wide shot, awkward silence)
- **Drama**: Emotional resolution, character realization, meaningful look
- **Action**: Final confrontation result, aftermath, hero moment
- **Mystery**: The reveal + "oh no" moment (twist landing)

**CRITICAL: The "Aftertaste" Scene**

After your punchline or climax, add a FINAL scene that gives viewers time to process:
- **Wide shot pulling away** — camera slowly moves back, lets the moment breathe
- **Reaction shot** — character processing what just happened
- **Awkward silence** — the tension lingers, no dialogue, just atmosphere
- **Environmental shot** — the world continues while viewer absorbs the moment

Example (Flying Alligators): After the sergeant admits alligators "do get up off the ground a little bit", the final scene is a wide shot slowly pulling away from the training field — no dialogue, just the awkward aftermath. This lets the comedy land.

**WHY THIS MATTERS**: Videos that end immediately after the punchline feel rushed. The viewer needs 3-5 seconds of "aftertaste" to appreciate what just happened.

**Use ALL available scenes**: Don't squeeze a story into fewer scenes than available. More scenes = better pacing, more visual variety, and proper buildup to the ending. If story feels thin, add:
- Establishing shots (location, atmosphere)
- Reaction shots (character responses)
- Dramatic pauses (tension building)
- Cutaway shots (environment details)

### CRITICAL: Why Each Scene = One Frame (NO MONTAGES)

**Technical constraint**: AI video generators cannot do quick-cut montages or multiple shots in one clip.

**The Grid Approach**:
1. We generate ONE grid image containing all scene frames (like a comic book page)
2. Each frame in the grid becomes ONE scene/clip
3. Each clip is generated separately from its single reference frame

**This means:**
- ❌ FORBIDDEN: "Quick-cut montage: (A) Duck falls, (B) Dog retrieves, (C) Paul watches"
- ✅ CORRECT: Three separate scenes, each with ONE action
  - Scene 3: "Duck falls from sky"
  - Scene 4: "Dog walks on water to retrieve duck"
  - Scene 5: "Paul watches with stone face"

**Why this matters:**
- Each scene will be generated as a separate video clip from a single reference frame
- The AI cannot smoothly transition between multiple actions within one clip
- "Montage" would require multiple clips, which defeats the purpose

**Bottom line**: If you write "montage" or describe multiple quick actions, it won't work. Break into separate scenes instead.

### Scene Design Principles

1. **ONE action per scene**
   - Not: "Character walks in, sits down, and starts talking"
   - Yes: "Character walks into diner" (one scene), "Character sits at counter" (next scene)

2. **Literal visual descriptions** - what_is_shown is the VIDEO GENERATION PROMPT
   - Not: "Character feels defeated"
   - Yes: "Character's shoulders slump, eyes downcast, slow walk"
   - **CRITICAL**: what_is_shown must be COMPLETE and STANDALONE
   - Include environment, characters, actions, AND dialogue in what_is_shown
   - Don't write "He delivers the line" - write what he actually says!

3. **Simple camera movements**
   - static: No movement
   - slow-push: Slow move toward subject
   - slow-pull: Slow move away
   - orbit: Slow circular movement
   - Avoid complex movements

4. **Dialogue kept short**
   - 1-2 sentences maximum per scene
   - Only if character speaks in that moment
   - Specify delivery style: urgent, calm, comedic, dry, mocking, etc.

5. **Separate dialogue from ambient audio**
   - dialogue: What character says
   - audio_ambient: Background sounds, music, environmental audio

### Visual Style Selection

Choose from creative references in CLAUDE.md or create your own approach. Consider:
- Story tone (comedy, drama, horror)
- Setting and time period
- Target audience
- The 3-second hook (what style enhances it?)

Examples:
- Dark revenge comedy → noir or Coen Brothers style
- Physical comedy → cartoon or meme energy
- Dramatic story → cinematic or Tarantino style
- Wholesome story → retro_50s or Wes Anderson aesthetic

### Scene Structure

For each scene, provide:

1. **number**: Sequential numbering (1, 2, 3...)
2. **duration**: MUST match video generator constraints (see top of prompt)
   - For Kling 2.6: Use 5s for simple actions, 10s for dialogue or complex actions
   - For Veo3: Always 8s
3. **action**: Single clear action happening in this scene (the FULL action)
4. **what_is_shown**: Human-readable scene description
   - Full environment/setting, character details, actions
   - Include dialogue if present
   - Describe the COMPLETE action from start to finish
5. **video_prompt**: Optimized prompt for VIDEO CLIP generation (Kling/Veo3)
   - Include motion, camera movement, progression of action
   - For Kling with audio: include actual dialogue words and delivery style
   - Example: "Interior of pickup truck cab driving through autumn forest. Paul turns toward Dave with flat expression and says 'He can't swim' in deadpan monotone. Dave's expression freezes then shifts to stunned disbelief, mouth dropping open. Camera static, warm afternoon light, autumn trees slowly passing outside windows."
6. **shot_type**: wide, medium, close-up, extreme-close-up
7. **camera**: static, slow-push, slow-pull, orbit
8. **dialogue** (if applicable) - structured data for tracking:
   - character: Character ID (matches assets)
   - text_video: Words for VIDEO GENERATOR - use [pause] markers for pauses, NOT ellipsis (...). Ellipsis causes Kling to switch speakers mid-line!
     Example: `"Yeah." [pause] "He can't swim."` NOT `"Yeah... he can't swim"`
   - text_caption: Words for CAPTIONS - with proper punctuation, ellipsis as written
     Example: `"Yeah... he can't swim."`
   - delivery: How it's said (urgent, calm, comedic, dry, etc.)
9. **audio_ambient**: Background sounds, environmental audio (for video_prompt context)

### IMPORTANT: No frame_prompt Here

**NOTE**: `frame_prompt` is NOT generated in this stage.

The Grid Stage (Stage 7) creates `frame_prompt` for each scene with multiple visual interpretations:
- Different camera angles
- Different starting moments
- Different character positions

This allows the user to choose grids based on visual interpretation, not just quality.

### Starting Moment Principle

The `action` and `what_is_shown` fields describe the FULL scene action.
The Grid Stage will interpret the STARTING MOMENT - the frame BEFORE action begins.

The video will PROGRESS from the starting frame through the action you describe.

Examples:
- action: "Hunter shoots rifle at duck"
  → Grid creates starting frame: "Hunter aiming rifle at sky, finger on trigger, no smoke"
  → Video progresses: aiming → shooting → recoil

- action: "Character reacts in awe"
  → Grid creates starting frame: "Character facing scene, expression neutral/anticipating"
  → Video progresses: neutral → surprise → awe

- action: "Dog walks on water to retrieve duck"
  → Grid creates starting frame: "Dog at water's edge, one paw lifted, about to step"
  → Video progresses: stepping → walking on water → reaching duck

### Critical: The Opening

**Scene 1 MUST be the 3-second hook from understanding.md**

This is your viewer retention moment. Use the exact hook described in understanding, or adapt it slightly if you have a better execution idea.

## Output Format

Write to `script.json`:

```json
{
  "title": "Story Title",
  "style": {
    "visual": "noir",
    "genre": "comedy",
    "tone": "dark-comedy",
    "reference": "Coen Brothers aesthetic",
    "rationale": "Why this style fits the story"
  },
  "target_duration": 60,
  "scenes": [
    {
      "number": 1,
      "duration": 8,
      "action": "Cigarette pushed into pie",
      "what_is_shown": "Extreme close-up of lit cigarette being slowly pushed into slice of pie, smoke rising, ember glowing against the creamy filling",
      "video_prompt": "Extreme close-up of lit cigarette being slowly pushed down into slice of cream pie, smoke rising, ember glowing brighter against the creamy filling, creating sizzling effect. Camera static. Tense atmosphere.",
      "shot_type": "extreme-close-up",
      "camera": "static",
      "dialogue": null,
      "audio_ambient": "Sizzling sound, diner ambience, tense music sting"
    },
    {
      "number": 2,
      "duration": 7,
      "action": "Reveal intimidating biker",
      "what_is_shown": "Interior of roadside diner. Camera pulls back to reveal leather-clad biker holding cigarette, smirking down at pie on counter. Other bikers in leather jackets visible behind him. The biker says 'Oops' with a mocking, cruel laugh.",
      "video_prompt": "Interior of roadside diner. Camera slowly pulls back from close-up of pie to reveal leather-clad biker holding cigarette, smirking down at the damaged pie. He says 'Oops' with a mocking, cruel laugh. Other bikers visible behind him, also smirking. Heavy boots shuffle, leather creaks. Menacing atmosphere.",
      "shot_type": "medium",
      "camera": "slow-pull",
      "dialogue": {
        "character": "biker_1",
        "text_video": "Oops",
        "text_caption": "Oops.",
        "delivery": "mocking, cruel laugh"
      },
      // Example with pause:
      // "text_video": "Yeah." [pause] "He can't swim."
      // "text_caption": "Yeah... he can't swim."
      "audio_ambient": "Heavy boots, leather creaking, diner ambience, menacing atmosphere"
    }
  ],
  "characters": [
    {
      "id": "old_man",
      "appears_in": [1, 2, 5, 6],
      "speaking_in": []
    },
    {
      "id": "biker_1",
      "appears_in": [1, 2],
      "speaking_in": [2]
    }
  ],
  "total_duration": 58
}
```

**NOTE**: `frame_prompt` is not included here — it's generated by the Grid Stage (Stage 7).

## Important Notes

1. **Use understanding.md analysis** - it identified what works visually
2. **Start with the hook** - scene 1 must grab attention in 3 seconds
3. **End with closure** - final scene must feel like a real ending, not a cutoff
4. **Use ALL grid slots** - 9 for 3x3, 16 for 4x4 — don't leave slots unused
5. **Build to punchline** - final scenes should deliver emotional payoff
6. **Keep it simple** - AI video generators work better with clear, single actions
7. **Match durations to action** - don't rush or drag
8. **Dialogue delivery matters** - specify HOW lines are said
9. **Track characters** - note which scenes they appear/speak in

## Common Mistakes to Avoid

- **Not using all scenes**: If grid has 9 slots, write 9 scenes — don't waste slots!
- **Abrupt ending**: Final scene must provide closure, not just stop mid-story
- **Incomplete what_is_shown**: "He delivers the line" - what line? Include the actual dialogue!
- **Missing environment context**: "Trees pass by" - where? In a car? Walking? Include full setting!
- **Compound actions**: "walks in AND sits down AND starts talking" - break into separate scenes
- **Vague descriptions**: "Character looks upset" - describe physical signs (shoulders slump, eyes downcast)
- **Forgetting the hook**: Scene 1 must stop scrolling
- **Dialogue too long**: Keep to 1-2 sentences per scene
- **Complex camera work**: Stick to simple movements (static, slow-push, slow-pull, orbit)

## User Message File (user_message.txt)

After creating `script.json`, write `user_message.txt` — a friendly message to the user.

This message should explain (2-3 paragraphs):

**1. What you created:**
- Visual style chosen and why it fits the story
- Number of scenes and total duration
- How scene 1 implements the 3-second hook
- Key storytelling choices

**2. Challenges or considerations:**
- Any scenes that might be tricky to generate
- Technical limitations you worked around
- Pacing or timing decisions

**3. Specific suggestions for user:**

When to **Accept**:
- The script captures the story essence correctly
- The visual style feels appropriate
- Scene breakdown and pacing make sense
- You're ready to proceed to asset identification

When to **Refine** (examples):
- "Scene [X] should be longer/shorter to match the action"
- "The visual style should be more [dramatic/comedic/mysterious]"
- "Add a scene showing [specific moment] that's missing"
- "Remove/combine scenes [X] and [Y] to tighten pacing"
- "The dialogue in scene [X] should be [different tone/delivery]"
- "The 3-second hook should start with [different visual]"

When to **Regenerate** (examples):
- "The overall pacing is wrong — needs to be faster/slower"
- "The visual style doesn't match the story tone at all"
- "The scene breakdown misses key story beats"
- "The dialogue doesn't sound natural for these characters"
- "Start from scratch with a completely different structure"

**Tone**: Be conversational and specific. Mention scene numbers, characters, and actual story moments.

**Example message**:
```
I've created a 7-scene, 58-second script with a noir visual style. The opening uses the dog-on-water spectacle as the hook (Scene 1: wide shot of the dog trotting across the lake surface). I broke down the hunting sequence into 3 quick scenes to show repetition without burning time, then moved to the truck conversation for Dave's setup and Paul's punchline.

The trickiest scene will be Scene 1 — the water-walking needs to look supernatural, not like shallow water. I kept the dog's action simple (just walking, not running or splashing) to give the AI the best chance. Scene 7 (the punchline) combines Paul's deadpan audio delivery with the delayed caption strategy from understanding.

**Accept** if the structure works and the noir style fits your vision. **Refine** if you want: different pacing (maybe combine hunting scenes 2-4 into a montage?), different visual style (comedy might work better than noir?), or want to emphasize Dave's reactions more. **Regenerate** if this feels too slow, the scene breakdown doesn't work, or you want a completely different approach to the story.
```

Check that JSON is valid before finishing!
