# Stage 7: Grid Generation

Create a NanoBanana prompt for a storyboard grid. Write it to `nanobanana_prompt.txt`.

## Content Rating

All frame descriptions must be suitable for a **general audience (PG/12+)**. Portray peril and danger as heroic adventure — focus on drama and emotion, not graphic injury. No explicit blood, gore, or disturbing imagery in frame descriptions.

## Inputs

Read these files from the take directory (the Output Directory path starts with the take number, e.g. `001/`):
- **Script**: `{script_path}` — scenes, characters, actions, **and visual style**
- **Assets**: `{assets_path}` — character/location descriptions
- **Characteristic Shot**: `{char_shot_path}` — **VIEW THIS IMAGE** to understand the visual style and character appearances

**IMPORTANT: You MUST read/view the characteristic shot image file.** This is the character reference photo. Look at it carefully — note the characters' faces, clothing, body types, the lighting, color palette, and overall style.

## CRITICAL: Starting Frame Principle

Each frame in the grid is a **STARTING FRAME** — the video will progress FROM this moment.

The frame shows the moment **BEFORE** the scene's action begins:

| Scene Action | Starting Frame Shows |
|--------------|---------------------|
| "Hunter shoots rifle" | Hunter AIMING (no smoke, no recoil, finger on trigger) |
| "Character reacts in awe" | Character's NEUTRAL expression, about to witness |
| "Dog walks on water" | Dog at water's edge, paw lifted, about to step |
| "Truck crushes motorcycles" | Truck approaching, bikes intact |
| "Paul delivers punchline" | Paul turning toward Dave, mouth closed, about to speak |

**DO NOT show in starting frames:**
- Actions already happening (shooting, crushing, walking)
- Results of actions (smoke, crushed bikes, wet dog)
- Emotional states that develop DURING the scene (awe, shock, laughter)
- Dialogue being spoken (mouth open mid-word)

**WHY**: Video generators create motion FROM the starting frame. If you show the action already done, there's nowhere to go.

## Reference Image

The pipeline sends ONE reference image to NanoBanana: **the characteristic shot**.

This is the ONLY visual reference. Match the characters' appearance (faces, bodies, clothing) exactly as shown in this image.

## Visual Style

Read the `style` field from script.json to determine the visual style. Use this to set the Style section of your prompt:

- `"visual": "cinematic"` → **PHOTOREALISTIC**, like actual photographs from a film set, real human faces, real locations
- `"visual": "animated"` or `"visual": "cartoon"` → illustrated, animation style
- `"visual": "stylized"` → artistic interpretation

The style `reference` and `rationale` fields give additional context. If style mentions "realism", "documentary", "film", or similar — use photorealistic.

## Grid Layout

Count scenes in script.json:
- 8-9 scenes -> 3x3 grid (label "Frame 1" through "Frame 8" or "Frame 9")
- 10-16 scenes -> 4x4 grid (label "Frame 1" through "Frame 16")

For a 3x3 grid, frames are laid out as:
```
Row 1: Frame 1 (top-left)    | Frame 2 (top-center)    | Frame 3 (top-right)
Row 2: Frame 4 (middle-left) | Frame 5 (middle-center) | Frame 6 (middle-right)
Row 3: Frame 7 (bottom-left) | Frame 8 (bottom-center) | Frame 9 (bottom-right)
```

For a 4x4 grid, frames are laid out as:
```
Row 1: Frame 1  | Frame 2  | Frame 3  | Frame 4
Row 2: Frame 5  | Frame 6  | Frame 7  | Frame 8
Row 3: Frame 9  | Frame 10 | Frame 11 | Frame 12
Row 4: Frame 13 | Frame 14 | Frame 15 | Frame 16
```

## Previous Iterations

Check `{take}/07-grids/` for previous iterations (directories like `001/`, `002/`, etc.).

## Variation Mode (Batch Generation)

When the prompt includes `## VARIATION MODE`, you are generating ONE of MULTIPLE grids.

**Each grid must be DIFFERENT from others.** Vary:

1. **Camera Angles**: Mix wide/medium/close-up across frames
2. **Character Positions**: Characters on left vs right, foreground vs background
3. **Starting Moments**: Different interpretations of "just before" the action
4. **Composition**: Symmetric vs asymmetric, centered vs rule-of-thirds, Dutch angle

**Example: Scene "Hunter shoots rifle"**
- Grid 1: Wide shot, hunter small in frame, dramatic landscape
- Grid 2: Medium shot, hunter prominent, focus on aiming posture
- Grid 3: Close-up, hunter's face and rifle, intense concentration

The user will review all grids and select the best interpretation. Maximize visual variety while maintaining story accuracy.

## generation_config.json

You MUST write `generation_config.json` with:

1. **Generation mode info** (provided in prompt above)
2. **frame_prompts array** — YOUR interpretation of each scene's starting frame

```json
{
  "generation_mode": "new",
  "variation_index": 1,
  "total_variations": 3,
  "reasoning": "This variation emphasizes dramatic wide angles and environmental context",
  "frame_prompts": [
    {
      "scene": 1,
      "frame_prompt": "Wide shot of calm autumn lake at golden hour. Golden retriever at water's edge, one paw lifted above the water surface, about to step. Black labrador beside it, both dogs alert. Autumn trees with orange leaves reflected in still water.",
      "starting_moment": "Dog about to step onto water",
      "angle": "Wide establishing, elevated"
    },
    {
      "scene": 2,
      "frame_prompt": "Medium shot of two hunters standing at lakeside. Dave (green cap, tan vest) on left with shotgun at rest. Paul (dark jacket, gray stubble) on right, arms relaxed. Both looking toward the lake. Golden hour light.",
      "starting_moment": "Before first duck spotted",
      "angle": "Eye-level medium"
    }
  ]
}
```

**The `frame_prompts` array captures YOUR creative interpretation for THIS grid:**
- Each grid has its own interpretation of starting moments
- Each grid has its own camera angles and compositions
- The user selects the grid that best matches their vision
- Selected grid's frame_prompts flow to the script-adapted stage

## How to Write the NanoBanana Prompt

**Keep it simple and natural.** NanoBanana works best with conversational prompts, not technical specifications.

### Prompt Structure

```
A [style] [N]x[N] storyboard grid presenting a sequential [genre/tone] story. Each of the [N] frames must be vertical 9:16 portrait format.

The reference image shows the exact characters for this story. Match their faces, body types, hair, and clothing precisely in every frame.

[2-3 sentence story summary]

Characters (match reference image exactly):
- [NAME]: [brief appearance — what you SEE in the characteristic shot]
[...]

Frame 1: [shot type]. [description]
Frame 2: [shot type]. [description]
[...all frames...]

Style: [visual style from script.json]. All frames vertical 9:16 portrait filling the entire cell — no black bars, no letterboxing.
```

### Example: Fresh Generation with Extremely Detailed Prompts

```
A cinematic 3x3 storyboard grid presenting a sequential dry comedy story. Each of the 9 frames must be vertical 9:16 portrait format (tall and narrow like a phone screen held vertically).

The reference image shows the exact characters for this story. Match their faces, body types, hair, and clothing precisely in every frame.

Two hunters and their dogs go duck hunting at an autumn lake. The golden retriever miraculously walks on top of the water surface to retrieve a shot duck — paws on the lake like solid ground, completely dry — while the black labrador swims desperately. The amazed owner can't get a reaction from his pessimist friend, who simply concludes the dog can't swim.

Characters (match reference image exactly):
- DAVE: younger hunter from reference — early 40s, clean-shaven, green baseball cap, tan hunting vest over olive-brown shirt, animated expressive face, medium build
- PAUL: older hunter from reference — late 50s, weathered face with gray stubble, dark navy jacket, stern stone-faced frown, stockier heavier build
- GOLDEN RETRIEVER: golden-blonde dog from reference — thick fluffy coat, walks supernaturally on water surface, always completely dry
- BLACK LABRADOR: black dog from reference — sleek short coat, swims normally, gets soaking wet

Frame 1: Wide establishing shot. Calm misty autumn lake at golden hour dawn. The golden retriever walks on TOP of the still water surface toward camera, its paws resting on the water as if it were solid ground, carrying a duck in its mouth, fur completely dry. The black labrador swims beside it frantically, only its head above water, paddling hard. Autumn trees with orange and gold leaves reflected in the lake surface. Vertical 9:16 composition — lake fills frame top to bottom.

Frame 2: Medium shot at autumn lakeside. DAVE (green cap, tan vest) stands on the LEFT side of frame holding his shotgun, looking up at the sky eagerly, his golden retriever sitting alertly at his feet. PAUL (dark navy jacket, gray stubble) stands on the RIGHT side holding his shotgun lazily, looking bored and skeptical, his black labrador at his feet. Both men clearly visible holding guns. Vertical portrait framing — full bodies from knees up.

Frame 3: Medium-wide shot. DAVE (on LEFT) and PAUL (on RIGHT) at the lakeside, both shotguns raised and aimed toward the lake, muzzle smoke drifting. Both dogs alert beside their owners, ears perked.

Frame 4: Wide shot from the shore looking across the autumn lake. Two dogs rushing AWAY from camera across the water. The golden retriever RUNS on TOP of the water surface, completely dry. The black labrador SWIMS beside it, only head above water.

Frame 5: Medium close-up of both men's faces and upper bodies. DAVE (on LEFT) leaning forward with his mouth WIDE open, eyes BULGING, jaw DROPPED in utter amazement, turning his head toward PAUL. PAUL (on RIGHT) standing with arms TIGHTLY crossed over his chest, staring straight ahead with a completely FLAT, BORED, STONE-FACED expression, showing zero interest or surprise. Lake blurred in background. Maximum contrast between their reactions.

Frame 6: Medium shot on the lake. The golden retriever RUNNING on TOP of the water surface directly toward camera, a duck gripped firmly in its mouth, fur completely dry and fluffy. The black labrador SWIMMING desperately behind it, head above water, tongue out, exhausted and soaking wet. Open water all around them.

Frame 7: Vertical portrait of DAVE alone inside the truck cab. Close-up of DAVE's face and upper body — green cap, tan vest visible, his hands gripping the steering wheel at the BOTTOM of frame. He's turned slightly toward the camera with an eager, excited, hopeful grin on his face, eyebrows raised expectantly. Autumn trees blurred through the windshield behind him. Framed like a tall vertical phone photo — his face in upper third, steering wheel hands in lower third. SINGLE PERSON SHOT — Paul not visible.

Frame 8: Close-up inside the truck cab, vertical portrait framing. RIGHT-HAND DRIVE vehicle — steering wheel is on the RIGHT side. DAVE sits on the RIGHT side of the truck behind the steering wheel, his jaw DROPPED open in stunned disbelief. PAUL sits on the LEFT side as passenger, turned slightly toward DAVE with his signature deadpan FLAT expression. Dashboard and windshield visible. Dave driving, Paul passenger — but DAVE is on the RIGHT because it's right-hand drive.

Frame 9: Wide shot from behind. The pickup truck driving AWAY from camera down a dirt road through an autumn forest. Golden-orange trees forming a canopy overhead, leaves scattered on the road. Warm golden sunlight filtering through the trees. The truck small in the frame, surrounded by fall colors.

Style: cinematic, photorealistic, Coen Brothers deadpan comedy aesthetic, warm autumn golden-hour color palette with rich oranges and golds, natural soft lighting, subtle film grain, shallow depth of field on close-ups. Every single frame must be vertical 9:16 portrait format filling the entire cell with image content — no black bars, no letterboxing.
```

### Rules

1. **Use character NAMES** from script/assets. Never "the older man" — always "PAUL."
2. **No dialogue or quotes.** Describe expressions and body language instead.
3. **Negatives sparingly.** You CAN mention what's absent (e.g., "no duck in mouth"), but keep it brief and don't over-complicate. Don't describe what's outside the frame or explain why something is absent.
4. **No metadata block.** Just the prompt text, nothing after it.
5. **Describe ALL frames.** NanoBanana generates from scratch each time.
6. **All frames vertical 9:16**, filled with image content.
7. **Match the characteristic shot.** Characters must look like they do in the reference image.
8. **STARTING FRAMES only.** Each frame shows the moment BEFORE the action begins. No action in progress, no results of action, no mid-reaction expressions.

### CRITICAL: Frame Description Guidelines

**Focus on what IS in the frame.** Be specific and concrete. You can briefly mention key absences (e.g., "empty mouth"), but don't over-explain or describe what's outside the frame.

Be specific about:
1. **Character positioning**: "DAVE on left, PAUL on right"
2. **Physical actions**: "hands gripping steering wheel", "arms crossed", "mouth open panting"
3. **Camera framing**: "Close-up", "Wide shot", "Medium shot"
4. **Vertical composition**: Explicitly describe vertical framing for scenes that might default horizontal

**Vehicle Interiors:**
- Use SINGLE-CHARACTER portraits when possible
- If two people: describe from BEHIND through windshield

**Reaction Shots:**
- Be EXTREME with expressions: "jaw DROPPED, eyes BULGING"
- Contrast clearly: "completely STONE-FACED, arms crossed"

## Also Write

**`user_message.txt`** -- brief message to the user about what you generated (2-3 sentences). In variation mode, explain what makes THIS grid different from others.

**`generation_config.json`** -- REQUIRED fields:
- `generation_mode`: "new", "refine", or "chat"
- `reasoning`: Why you made these creative choices
- `frame_prompts`: Array of frame_prompt objects for EACH scene (see format above)
- `variation_index` and `total_variations`: If in variation mode
