# Stage 6: Characteristic Shot - Reference Image for Grid

You are creating a detailed NanoBanana prompt for generating ONE master reference image that establishes the visual direction AND will be referenced in grid generation for consistency.

## Content Rating

All visual descriptions must be suitable for a **general audience (PG/12+)**. Portray characters in heroic, adventurous poses. No graphic injury or disturbing imagery.

## Your Task

Read the script and assets, then create a comprehensive NanoBanana prompt. Write to `nanobanana_prompt.txt` in this directory.

## Previous Stage Outputs

**Script**: `../03-script/current/script.json`
**Assets**: `../04-assets/current/assets.json`
**References**: `../05-references/*/selected.jpg` (if needs_references=true)

Read all of these carefully.

## Dual Purpose of Characteristic Shot

### Purpose 1: VALIDATION GATE
- Validates visual style before committing to full grid
- If it looks wrong, iterate HERE (cheap) not after grid (expensive)
- User must approve before proceeding

### Purpose 2: REFERENCE IMAGE FOR GRID
- **Grid prompt will reference this image**: "SAME old man as in characteristic shot", "SAME diner as in reference"
- **Must show most important recurring elements** so grid can maintain consistency
- Characters, locations, objects that appear in MULTIPLE scenes

## What to Include - Priority System

### Priority 1: MOST RECURRING CHARACTER(S)

Include ALL characters that appear in the episode:
- Count `appears_in` scenes for each character
- Show ALL named characters — the characteristic shot is the ONLY reference for grid generation
- Characters without reference sheets MUST be described in detail (face, build, clothing, distinguishing features)
- Characters WITH reference sheets: use their asset ID, describe only pose/action
- The image is 16:9 landscape — there's room for 2-4 characters comfortably

For truck-stop-revenge analysis:
- old_man: 4 scenes → **INCLUDE**
- biker_1: 5 scenes → **INCLUDE** (or at least visible in background)
- waitress: 1 scene → skip

### Priority 2: MAIN LOCATION

The location that appears in MOST scenes:
- Count `appears_in` for each location
- Use asset ID in the prompt (e.g., "scorpion_desert") — the location reference is attached
- Describe only lighting/mood/atmosphere, NOT what the location looks like

For truck-stop-revenge:
- diner_interior: 6 scenes → **PRIMARY LOCATION**
- diner_exterior: 2 scenes → skip (or barely visible through window)

### Priority 3: KEY RECURRING OBJECTS

Include as MANY assets from the script as possible — characters, objects, location details:
- Objects with `importance: "critical"` or appearing in 2+ scenes — MUST include
- Even objects appearing once — include if they fit the composition naturally
- The characteristic shot is the SOLE visual reference for grid generation — anything NOT shown here may be inconsistent across scenes
- Use asset IDs in the prompt (e.g., "telephone", "revolver") — object references are attached
- Describe only placement and interaction, NOT the object's appearance

For truck-stop-revenge:
- pie_slice: scenes 1,2, critical → **INCLUDE**
- diner_counter: scenes 2,3,4,5,9, atmospheric but recurring → **INCLUDE**
- glass_of_milk: scene 3 only → skip
- motorcycles: scenes 6,8, critical → maybe visible through window

### CRITICAL: Full-Body Framing

All characters MUST be shown full-body from head to feet — NO cropping at the knees, waist, or ankles. The camera should be far enough back to capture every character's complete figure including footwear. This is essential because the grid needs full-body references for consistency across scenes.

### Priority 4: VISUAL STYLE

From script.json style section:
- Visual aesthetic (noir, cinematic, cartoon, etc.)
- Reference (Coen Brothers, Tarantino, etc.)
- Lighting and color palette
- This MUST be strong so grid can match it

## Reference Images

Reference images (character sheets, location photos, object photos) are attached automatically to the NanoBanana API call. A `[REFERENCE IMAGES: ...]` prefix is prepended to your prompt listing each image with its role, e.g.:

```
Image 1: Tom Morrison character sheet (face/body/clothing reference)
Image 5: Telephone object reference (match this object appearance exactly)
Image 3: Scorpion Desert location reference (match this environment appearance)
```

**Your job**: Write a prompt that uses the SAME asset names/IDs so NanoBanana can connect your text to the right reference image. Do NOT re-describe what's in the reference images — NanoBanana already sees them.

## Scene Analysis Process

Before writing prompt:

1. **Count character appearances** - which characters are in most scenes?
2. **Count location appearances** - which location dominates?
3. **Identify critical recurring objects** - what's in multiple scenes?
4. **Check visual style** - what aesthetic MUST be established?

Example for truck-stop-revenge:
- Characters: old_man (4), biker_1 (5), biker_2 (5), biker_3 (4), waitress (1)
  → Show: old_man + biker_1 (or bikers in background)
- Locations: diner_interior (6), diner_exterior (2)
  → Show: diner_interior fully
- Objects: pie_slice (critical, 2 scenes), counter (5 scenes), motorcycles (critical, 2 scenes)
  → Show: pie on counter, maybe bikes visible through window
- Style: noir-cinematic, Coen Brothers, dark comedy
  → Dramatic lighting, high contrast

## NanoBanana Prompt Format

**CRITICAL: Reference images are attached to the NanoBanana API call.** Your prompt must USE asset names/IDs that match the reference image labels — do NOT re-describe what's already shown in the reference images.

### The Core Rule

Reference images show WHAT things look like. Your prompt describes the SCENE: who is where, doing what, from what camera angle, with what lighting.

- **Characters**: Name them by asset ID. Write "tom_morrison stands..." NOT "A big tall rangy man with close-cropped black hair, angular jaw...". NanoBanana has the character sheet.
- **Objects**: Name them by asset ID. Write "tom_morrison holds a telephone, its screen glowing teal" NOT "a chunky black plastic body with a stubby coiled antenna...". NanoBanana has the object reference.
- **Locations**: Name them by asset ID, add only mood/lighting. Write "scorpion_desert canyon — harsh golden light, layered strata" NOT a paragraph about rock colors.
- **Only describe what references CAN'T show**: pose, action, camera angle, lighting direction, spatial relationships, mood, atmosphere. These are what text adds.

### Structure

```
[VISUAL STYLE — one line from script.json]

[SCENE — who is where, doing what, using asset IDs]

[COMPOSITION — camera angle, framing, spatial planes]

[LIGHTING — direction, color temperature, key contrasts]

[MOOD — emotional tone, atmosphere]

NEGATIVE PROMPT: [things to avoid]
```

### Writing Guidelines

1. **Use asset IDs as names** — "tom_morrison", "telephone", "scorpion_desert" — matching the reference image labels exactly
2. **Be concise** — 150-300 words with 8 reference images, not 800+. Images carry visual info, text carries scene direction.
3. **Front-load style and scene** — NanoBanana weights early words more
4. **Lighting is your main tool** — direction, color, contrast, shadows. This is what text adds beyond the references.
5. **Composition is your other tool** — camera angle, framing, what's in foreground/mid/background
6. **Negative prompts matter** — prevent common failure modes

### Example Prompt (with reference images attached)

```
Cinematic noir aesthetic, Coen Brothers visual style — dramatic lighting, deadpan realism.

old_man sits at the diner_interior counter, stone-faced, staring at a pie_slice on a white plate in front of him. His calloused hands rest on the worn Formica counter. Behind him in the mid-ground, biker_1 stands with arms crossed, watching. The diner stretches behind them — counter stools, checkered floor, large window showing motorcycles in the parking lot outside.

Full-length medium shot, slight low angle. old_man sharp in foreground, biker_1 slightly soft in mid-ground, diner environment filling background. Three visual planes.

Dramatic key light from overhead fluorescents casting hard shadows downward. Warm amber on counter and wood paneling, cool blue from window. High contrast, deep blacks. Film grain texture.

Tense, quiet moment. The calm before confrontation.

NEGATIVE PROMPT: cartoon, anime, 3D render, CGI, oversaturated, fantasy, text overlay, watermark, distorted anatomy, extra limbs, blurry, low quality
```

Notice: old_man, biker_1, pie_slice, diner_interior, motorcycles — all named by asset IDs matching the attached reference images. No need to describe old_man's wrinkles or the diner's decor — the images show that.

### Without Reference Images (rare)

If NO reference images are available, fall back to verbose descriptions — describe every visual detail in text since NanoBanana has nothing else to work from. But this should be rare in the current pipeline.

## Key Principles (From Experience)

1. **Let reference images work** — name assets, don't re-describe them
2. **Style must be unmistakable** — one clear style line from script.json
3. **Lighting is your text superpower** — direction, color, contrast, shadows
4. **Composition is your other superpower** — camera angle, framing, spatial planes
5. **Pack it with recurring asset IDs** — more named assets = more consistency anchors for the grid
6. **Negative prompts comprehensive** — prevent cartoon, CGI, text, distortions

## Output Format

Write to `nanobanana_prompt.txt`:

```
[Your concise NanoBanana prompt using asset IDs]

---
METADATA:
- Characters shown: [asset_ids with scene counts, e.g. tom_morrison (15/16 scenes)]
- Location: [asset_id with scene count]
- Objects shown: [asset_ids]
- Visual style: [from script.json]
- Asset IDs used in prompt: [list all IDs that appear in the prompt text — these must match reference image labels]
```

## Output Files

You must create TWO files:

1. **nanobanana_prompt.txt** — The detailed NanoBanana prompt (format above)
2. **user_message.txt** — A friendly message explaining your creative choices

The orchestrator will then:
1. Call NanoBanana API with your prompt to generate the image
2. Show both the image AND your user_message to the user
3. If approved → this becomes THE reference for grid generation
4. If rejected → you iterate with feedback

## User Message File (user_message.txt)

After creating `nanobanana_prompt.txt`, write `user_message.txt` — a friendly message to the user.

This message should explain (2-3 paragraphs):

**1. What you chose to show and why:**
- Which character(s) and why (scene count, story importance)
- Which location and why (most-used, establishes visual world)
- Key recurring objects included
- Visual style and how it anchors the whole story

**2. Creative considerations:**
- Why this composition works as a reference for grid generation
- What elements the grid will be able to match from this image
- Any visual challenges (lighting, character consistency, mood)
- What's deliberately excluded and why (secondary characters, minor locations)

**3. Specific suggestions for user:**

When to **Accept**:
- Characters look right for the story
- Location matches your vision
- Visual style feels appropriate
- This image could anchor all subsequent scene frames

When to **Refine** (examples):
- "The character should look [older/younger/different clothing]"
- "The lighting should be [warmer/cooler/more dramatic]"
- "Add [specific object] that's important to the story"
- "The location feels too [clean/dark/cramped] — make it more [lived-in/bright/spacious]"
- "The visual style should be more [noir/comedic/warm] — less [gritty/cartoonish/cold]"
- "Show [different character] instead of [current one]"

When to **Regenerate** (examples):
- "The visual style is completely wrong for this story"
- "This doesn't capture the mood at all"
- "Start over with a completely different composition"
- "Wrong character/location focus"

**Tone**: Be conversational and specific. Mention actual character names, scene numbers, and story details.

**Example message**:
```
I chose to show the old man (Paul) and Dave's golden retriever at the autumn lake — these are the two most recurring elements across 6 of 8 scenes. The lake setting appears in scenes 1-6, and the dog is the visual centerpiece of the whole story. I placed the dog on the glassy water surface to immediately establish the miracle element, with Paul watching from the shore with his signature deadpan expression.

The visual style is warm autumn cinematography — golden hour light, reflected fall colors on the water, muted earth tones. This gives us a consistent color palette the grid can match across all lake scenes. I deliberately excluded Dave from this shot because Paul's stoic presence contrasts better with the dog's impossible water-walking, and we need that contrast to be visually established before the grid splits into individual scenes.

**Accept** if the character look, lake setting, and autumn color palette work for you. **Refine** if you want: different lighting (maybe overcast instead of golden hour?), different character focus (show Dave instead of Paul?), more/less dramatic composition, or different clothing details. **Regenerate** if the overall visual direction feels wrong for the story's comedic tone.
```
