Grok AutomationAdd to Chrome
07 · Mode · 5 min read

Reference to Video

The most powerful mode — and the one with the strictest rules about how to write the prompt.

When to use this mode

Reference to Video is the right pick when:

  • You have multiple distinct elements that need to coexist in one shot — a character, a prop, a backdrop.
  • You want fine-grained control over which references appear in which prompt, beyond what filename-matching gives you.
  • You’re producing a consistent-character storyboard where the same hero appears across multiple shots.

If you have only one image and want it animated as-is, Image to Video is simpler.

The @filename rule

Unlike the other modes, Reference to Video doesn’t match references by substring. It looks for an explicit @filename token in your prompt text. Each @filename pulls the matching image into the composition.

Filenames are without the file extension. So @hero-frontal matches hero-frontal.png in your library. Case-insensitive; spaces in filenames work but make @references awkward, so most people rename to use hyphens.

The side panel surfaces this rule as a hint right above the image dropzone:

reference each image in prompt with @filename

Screenshot pending Reference image(s) section with the @filename hint visible to the right of the label
The hint only appears in Reference to Video mode — the other modes use different attachment rules.

Set up a run

  1. Click the Reference to Video mode tile.
  2. In Reference image(s), upload every element you might want to use: characters, props, backdrops. Name them deliberately (@hero, @cafe, @cup).
  3. In Prompts, write each prompt with explicit @filename references for the elements it needs.
  4. In Refine, set Length, Quality, and Aspect exactly as in Text to Video.
  5. Click Run →.

A worked example

Library:

  • hero.png — your protagonist, three-quarter portrait.
  • cafe.png — the establishing interior shot of a café.
  • cup.png — close-up of a coffee cup.
  • villain.png — the protagonist’s foil.

Prompts (three shots of a consistent-character scene):

@hero walks into @cafe, looks around, takes a seat by the window. Soft afternoon light.

Close-up on @hero reaching for @cup, steam rising, sips, sets it down. Same lighting.

@villain enters @cafe through the door behind @hero, stops, watches. Tension.

Each prompt explicitly tells Grok which library images to composite. The hero stays visually consistent across all three shots because the same @hero image anchors each one.

What the prompt list shows mid-run

When you click Run, each prompt row gets:

  • The prompt text with @filename tokens highlighted.
  • Thumbnail strip showing exactly which library images were pulled in.
  • The usual queuedgenerating · N%done status.

If a row says failed with unknown @reference, you’ve typed an @ token that doesn’t match any library image. Common cause: the filename uses an underscore but the prompt uses a hyphen, or vice versa.

Auto-match and Max images don’t apply here

The Auto-attach matching reference images checkbox and Max input images per prompt dropdown — visible in Image to Image — are hidden in Reference to Video. The @filename system replaces both, so the side panel removes the irrelevant controls.

Tips you’ll want eventually

  • Establish naming conventions early. Once you have 30 references in a library, @hero-v3-shoulders-up is a much better filename than @asset_007.
  • Reuse the same library across a project. Reference to Video’s strength is consistency — keep your @hero and @cafe constant and you’ll get a much tighter storyboard than swapping in fresh stills per shot.
  • Combine with Chain prompts carefully. Chain mode replaces the start frame with the previous output, which can fight the @reference system. The two work together but the chained output overrides any character references mid-shot.

Grok Automation is an independent browser extension for Grok users. Not affiliated with xAI.