Prompting Guide for LTX-2

Master prompting for LTX-2. Discover how to build detailed, story-driven prompts that turn your creative vision into stunning AI-generated videos.

Table of contents:

To get the most out of the LTX-2 model, a good prompt will make all the difference. The key is painting a complete picture of the story you’re telling that flows naturally from beginning to end, covering all the elements the model needs to bring your vision to life. If you’re new to writing prompts for video, this guide will help you construct an effective prompt.

PROMPT:

An action packed, cinematic shot of a monster truck driving fast towards the camera, the truck passes the cameras it pans left to follow the trucks reckless drive. dust and motion blur is around the truck, hand held feel to the camera as it tries to track its ride into the distance. the truck then drifts and turns around, then drives back towards the camera until seen in extreme close up.

PROMPT:

A warm sunny backyard. The camera starts in a tight cinematic close-up of a woman and a man in their 30s, facing each other with serious expressions. The woman, emotional and dramatic, says softly, “That’s it... Dad’s lost it. And we’ve lost Dad.”
The man exhales, slightly annoyed: “Stop being so dramatic, Jess.”
A beat. He glances aside, then mutters defensively, “He’s just having fun.”
The camera slowly pans right, revealing the grandfather in the garden wearing enormous butterfly wings, waving his arms in the air like he’s trying to take off.
He shouts, “Wheeeew!” as he flaps his wings with full commitment.
The woman covers her face, on the verge of tears. The tone is deadpan, absurd, and quietly tragic.

Key Aspects to Include

  • Establish the shot. Use cinematography terms that match your preferred film genre. Include aspects like scale or specific category characteristics to further refine the style you’re looking for.
  • Set the scene. Describe lighting conditions, color palette, surface textures, and atmosphere to shape the mood. 
  • Describe the action. Write the core action as a natural sequence, flowing from beginning to end.
  • Define your character(s). Include age, hairstyle, clothing, and distinguishing details. Express emotions through physical cues.
  • Identify camera movement(s). Specify when the view should shift and how. Including how subjects or objects appear after the camera motion gives the model a better idea of how to finish the motion.
  • Describe the audio. Use clear descriptions for ambient sounds, music, audio, and speech. For dialogue, place the text between quotation marks and (if required) mention the language and accent you would like the character to have.
PROMPT:

INT. OVEN – DAY. Static camera from inside the oven, looking outward through the slightly fogged glass door. Warm golden light glows around freshly baked cookies. The baker’s face fills the frame, eyes wide with focus, his breath fogging the glass as he leans in. Subtle reflections move across the glass as steam rises.
Baker (whispering dramatically): “Today… I achieve perfection.”
He leans even closer, nose nearly touching the glass.
“Golden edges. Soft center. The gods themselves will smell these cookies and weep.”
Baker: “Wait—”
(beat)
“Did I… forget the chocolate chips?”
Cut to side view — coworker pops into frame, chewing casually.
Coworker (mouth full): “Nope. You forgot the sugar.”
Quick zoom back to the baker’s horrified face, pressed against the oven door, as cookies deflate behind the glass. Steam drifts upward in slow motion.
pixar style acting and timing

For Best Results

  • Keep your prompt in a single flowing paragraph to give the model a cohesive scene to work with. 
  • Use present tense verbs to describe movement and action.
  • Match your detail to the shot scale. Closeups need more precise detail than wide shots.
  • When describing camera movement, focus on the camera’s relationship to the subject. 
  • You should expect to write 4 to 8 descriptive sentences to cover all the key aspects of the prompt. 
  • Don’t be afraid to iterate! LTX-2 is designed for fast experimentation, so refining your prompt is part of the workflow. 
PROMPT:

NT. DAYTIME TALK SHOW SET – AFTERNOON
Soft studio lighting glows across a warm-toned set. The audience murmurs faintly as the camera pans to reveal three guests seated on a couch — a middle-aged couple and the show’s host sitting across from them.
The host leans forward, voice steady but probing:
Host: “When did you first notice that your daughter, Missy, started to spiral?”
The woman’s face crumples; she takes a shaky breath and begins to cry. Her husband places a comforting hand on her shoulder, looking down before turning back toward the host.
Father (quietly, with guilt): “We… we don’t know what we did wrong.”
The studio falls silent for a moment. The camera cuts to the host, who looks gravely into the lens.
Host (to camera): “Let’s take a look at a short piece our team prepared — chronicling Missy’s downward path.”
The lights dim slightly as the camera pushes in on the mother’s tear-streaked face. The studio monitors flicker to life, beginning to play the segment as the audience holds its breath.

Additional Helpful Terms

This is not an exhaustive list. Use it to give you some examples of how to craft the result you’re looking for. 

Categories

Animation: stop-motion, 2D/3D animation, claymation, hand-drawn

PROMPT:

Pinocchio is sitting in an interrogation room, looking nervous, and slightly sweating. He's saying very quietly to himself "I didn't do it... I didn't do it... I'm not a murderer". Pinocchio's nose is quickly getting longer and longer. The camera is zooming in on the double sided mirror in the back of the room, The mirror is turning black as the camera approaches it, and exposes a blurry silhouette of two FBI detectives who stand in the dark lit room on the other side. One of them is saying "I'm telling you, I have a feeling something is off with this kiddo

Stylized: comic book, cyberpunk, 8-bit pixel, surreal, minimalist, painterly, illustrated

PROMPT:

The young african american woman wearing a futuristic transparent visor and a bodysuit with a tube attached to her neck. she is soldering a robotic arm. she stops and looks to her right as she hears a suspicious strong hit sound from a distance. she gets up slowly from her chair and says with an angry african american accent: "Rick I told you to close that goddamn door after you!". then, a futuristic blue alien explorer with dreadlocks wearing a rugged outfit walks into the scene excitedly holding a futuristic device and says with a low robotic voice: "Fuck the door look what I found!". the alien hands the woman the device, she looks down at it excitedly as the camera zooms in on her intrigued illuminated face. she then says: "is this what I think it is?" she smiles excitedly. sci-fi style cinematic scene

Cinematic: period drama, film noir, fantasy, epic space opera, thriller, modern romance, experimental film, arthouse, documentary 

PROMPT:

Cinematic action packed shot. the man says silently: "We need to run." the camera zooms in on his mouth then immediately screams: "NOW!". the camera zooms back out, he turns around, and starts running away, the camera tracks his run in hand held style. the camera cranes up and show him run into the distance down the street at a busy New York night.

Visual Details

  • Lighting conditions: flickering candles, neon glow, natural sunlight, dramatic shadows
  • Textures: rough stone, smooth metal, worn fabric, glossy surfaces
  • Color palette: vibrant, muted, monochromatic, high contrast
  • Atmospheric elements: fog, rain, dust, particles, smoke
PROMPT:

The camera opens in a calm, sunlit frog yoga studio. Warm morning light washes over the wooden floor as incense smoke drifts lazily in the air. The senior frog instructor sits cross-legged at the center, eyes closed, voice deep and calm. “We are one with the pond.” All the frogs answer softly: “Ommm...” “We are one with the mud.” “Ommm...” He smiles faintly. “We are one with the flies.” A quiet pause.
The camera slowly pans to the side — one frog twitches, eyes darting. Suddenly — *thwip!* — its tongue snaps out, catching a fly mid-air and pulling it into its mouth. The master exhales slowly, still serene.
“But we do not chase the flies…”
Beat. “…not during class.” The guilty frog freezes, then lowers its head in visible shame, folding its hands back into the meditative pose. The other frogs resume their chant: “Ommm...” Camera holds for a moment on the embarrassed frog, eyes closed too tightly, pretending nothing happened.

Sound and Voice

  • Setting: Ambient coffeeshop noises, dripping rain and wind blowing, forest ambience with birds singing
  • Dialogue style: Energetic announcer, resonant voice with gravitas, distorted radio-style, robotic monotone, childlike curiosity
  • Volume: quiet whisper, mutters, shouts, screams 
PROMPT:

INT. TRAIN STATION – SUNSET LIGHT
Close-up shot — two young women lovers. in a tight embrace, wrapped in wool coats and checkered scarves. The golden light from the window washes softly over their faces. The girl with dark hair and bangs trembles slightly, her sweet voice breaking: “I hate New York. I don’t wanna go.”
The young blond girl in the front say in a loving soft voice: “You don’t hate New York… and you’re going.”
blond girl close her eyes, tearing.
Girl with dark hair again (softly, almost whispering): “If you’re not in New York… then I hate New York.”
They hold each other again as the sound of a distant train horn echoes. The camera lingers on their faces, bathed in amber light, before fading to black.

Technical Style Markers

  • Camera language: follows, tracks, pans across, circles around, tilts upward, pushes in, pulls back, overhead view, handheld movement, over-the-shoulder, wide establishing shot, static frame
  • Film characteristics: jittery stop-motion, pixelated edges, lens flares, film grain
  • Scale indicators: expansive, epic, intimate, claustrophobic
  • Pacing and temporal effects: slow motion, time-lapse, rapid cuts, lingering shot, continuous shot, freeze-frame, fade-in, fade-out, seamless transition, dynamic movement, sudden stop
  • Specific visual effects (if relevant): particle systems, motion blur, depth of field
PROMPT:

An animated cinematic shot. a robot, walks slowly, the camera dollys back and keep the robots slow walk in a medium shot. the robot start running slowly and heavily. it then stops, and the camera keeps dollying back, until a blue similiar robot appears in an over the shoulder shot.

What Works Well with LTX-2

Cinematic compositions:
Wide, medium, and close-up shots with thoughtful lighting, shallow depth of field, and natural motion.
Emotive human moments:
LTX-2 excels at single-subject emotional expressions, subtle gestures, and facial nuance.
Atmosphere & setting:
Weather effects like fog, mist, golden hour light, soft shadows, rain, reflections, and ambient textures all help ground the scene.
Clean, readable camera language:
Clear directions like “slow dolly in,” “handheld tracking,” or “over-the-shoulder” improve consistency.
Stylized aesthetics:
Painterly, noir, analog film look, fashion editorial, pixelated animation, or surreal art styles work especially well when named early in the prompt.
Lighting and mood control:
Backlighting, color palettes, soft rim light, flickering lamps — these anchor tone better than generic mood words.
Voice:
Characters can talk and sing in various languages.
PROMPT:

EXT. SMALL TOWN STREET – MORNING – LIVE NEWS BROADCAST
The shot opens on a news reporter standing in front of a row of cordoned-off cars, yellow caution tape fluttering behind him. The light is warm, early sun reflecting off the camera lens. The faint hum of chatter and distant drilling fills the air.
The reporter, composed but visibly excited, looks directly into the camera, microphone in hand.
Reporter (live):
“Thank you, Sylvia. And yes — this is a sentence I never thought I’d say on live television — but this morning, here in the quiet town of New Castle, Vermont… black gold has been found!”
He gestures slightly toward the field behind him.
Reporter (grinning):
“If my cameraman can pan over, you’ll see what all the excitement’s about.”
The camera pans right, slowly revealing a construction site surrounded by workers in hard hats. A beat of silence — then, with a sudden roar, a geyser of oil erupts from the ground, blasting upward in a violent plume.
Workers cheer and scramble, the black stream glistening in the morning light. The camera shakes slightly, trying to stay focused through the chaos.
Reporter (off-screen, shouting over the noise):
“There it is, folks — the moment New Castle will never forget!”
The camera catches the sunlight gleaming off the oil mist before pulling back, revealing the entire scene — the small-town skyline silhouetted against the wild fountain of oil.

What to Avoid with LTX-2

Internal states:
Avoid emotional labels like “sad” or “confused” without describing visual cues. Use posture, gesture, and facial expression instead.
Text and logos:
LTX-2 does not currently generate readable or consistent text. Avoid signage, brand names, or printed material.
Complex physics or chaotic motion:
Non-linear or fast-twisting motion (e.g.,jumping, juggling) can lead to artifacts or glitches. However, dancing can work well.
Scene complexity overload:
Too many characters, layered actions, or excessive objects reduce clarity and model accuracy.
Inconsistent lighting logic:
Avoid mixing conflicting light sources (e.g., “a warm sunset with cold fluorescent glow”) unless clearly motivated.
Over complicated prompts:
The more actions/ characters/ instructions you add, the higher the chance some of them won’t be seen in the output. Begin with simple things and layer on additional instructions as you iterate.