LTX-2.3 Prompt Guide: Tips For Prompting LTX-2.3

Learn how to write effective prompts for LTX-2.3 video generation — from cinematic language and scene structure to dialogue, audio, and camera movement.

Rachel Luxemburg

April 13, 2026

Start Now

Table of Contents:

Key Takeaways:

LTX-2.3 responds best to long, detailed prompts — the more specific you are about subject, action, lighting, camera movement, and audio, the closer the output matches your vision
Break dialogue into short phrases with acting directions between each line, and use physical cues rather than emotional labels to direct character performance
Match your prompt length to your video length — short prompts for long videos leave the model without enough direction to fill the duration

Strong prompts produce better videos. LTX models respond best to detailed, descriptive prompts that paint a complete picture of the scene you're generating. Think of it as writing a shot description for a cinematographer.

If you're new to writing prompts for video generation, this guide will help you construct effective, production-ready prompts.

Core Principles

Be specific and descriptive

‍Instead of "a person walking," try "a young woman in a red coat walking briskly through a rain-soaked Tokyo street at night, neon reflections on wet pavement, handheld camera following from behind."

Describe the full scene

‍Include the subject, their action, the environment, the lighting, and the camera behavior. The more complete your description, the closer the output matches your intent.

Use cinematic language

‍Terms like "macro lens," "tracking shot," "shallow depth of field," "golden hour," and "low angle" are understood by the model and directly influence the output.

Describe audio when relevant

‍For endpoints that generate synchronized audio, include audio descriptions in your prompt: "the sound of rain on pavement," "soft ambient music," "a crowd cheering in the distance."

What Changed in LTX-2.3

LTX-2.3 has a redesigned text connector architecture that makes it significantly more responsive to prompt details. This means:

More faithful prompt adherence

‍Specific descriptions of facial expressions, timing, pauses, and emotional beats translate more reliably into the output. You can direct acting at a granular level — "he pauses, looks to the side, then continues speaking with a cracking voice" — and expect the model to follow.

Prompt length matters

‍Longer, more descriptive prompts consistently outperform short ones on 2.3. If you're generating longer videos (8–10 seconds), make sure your prompt is detailed enough to fill the duration. A short prompt for a long video often results in the model rushing through the described action.

Break dialogue into segments

‍When prompting for speaking characters, break long sentences into shorter phrases with acting directions between them. For example:

A middle-aged man with greying hair speaks in a sad, slow-paced voice, "I remember after you kids came along..." He pauses and looks to the side, then continues, "your mom..." His eyes widen momentarily. He finishes with a cracking voice, "said something to me I never quite understood." The camera slowly zooms into his face. The audio is crisp with faint room tone.

This gives the model explicit direction on pacing, emotion, and physical acting for each beat.

Audio descriptions have more impact

‍With improved audio quality in 2.3, it's worth spending more attention on audio prompts. Describe the acoustic environment, the character's voice qualities, and any ambient sounds you want.

Key Elements to Include

When writing a prompt, aim to include the following elements:

1. Establish the Shot

Use cinematography terms that match your intended genre. Include shot scale or category-specific characteristics to refine the visual style.

2. Set the Scene

Describe lighting conditions, color palette, surface textures, and atmosphere to establish mood and tone.

3. Describe the Action

Write the core action as a natural sequence, flowing clearly from beginning to end.

4. Define the Character(s)

Include age, hairstyle, clothing, and distinguishing features. Express emotion through physical cues, not abstract labels.

5. Identify Camera Movement(s)

Specify how and when the camera moves. Describing how subjects appear after the movement helps the model complete the motion accurately.

6. Describe the Audio

Clearly describe ambient sound, music, speech, or singing.

Place spoken dialogue in quotation marks
Specify language and accent if needed

For Best Results

Write your prompt as a single flowing paragraph
Use present tense verbs for action and movement
Match the level of detail to the shot scale — close-ups need more detail than wide shots
Describe camera movement relative to the subject

Tips by Use Case

Text-to-Video

Start with a strong visual description. Include subject, action, environment, lighting, camera movement, and audio. The model generates everything from scratch, so detail is your primary lever.

Image-to-Video

Focus your prompt on the motion and action you want — the visual starting point is already defined by your input image. Describe what happens next: how the subject moves, how the camera follows, what sounds emerge. Avoid describing the static elements already visible in the image. Instead, describe the transition from stillness to motion.

Audio-to-Video

Your audio input anchors the temporal structure. Use the prompt to describe the visual interpretation of that audio — what scenes, subjects, and camera work should accompany the soundtrack.

What Works Well

Strength	Description
Cinematic compositions	Wide, medium, and close up shots with thoughtful lighting, shallow depth of field, and natural motion
Emotive human moments	Strong single subject emotional expressions, subtle gestures, and facial nuance
Atmosphere and setting	Fog, mist, golden hour light, rain, reflections, ambient textures
Clear camera language	Explicit instructions like "slow dolly in" or "handheld tracking"
Stylized aesthetics	Painterly, noir, analog film, fashion editorial, pixelated animation
Lighting and mood control	Backlighting, color palettes, rim light, flickering lamps
Voice capabilities	Characters can talk and sing, with support for multiple languages

What to Avoid

Avoid	Why It Doesn't Work
Internal emotional states	Use visual cues instead of labels like "sad" or "confused"
Text and logos	Readable text is not currently reliable
Complex physics	Chaotic motion can introduce artifacts (dancing is OK)
Overloaded scenes	Too many characters or actions reduce clarity
Conflicting lighting	Mixed light logic confuses scene interpretation
Overcomplicated prompts	Start simple and layer complexity gradually

Common Mistakes

Too vague: "A nice video of nature" — the model has too many options and picks arbitrarily. Be specific about what's in the frame.
Over-constrained: "Exactly 3 birds flying left to right at 45 degrees while the camera pans right at 2 degrees per second" — the model works best with natural language descriptions, not numerical specifications.
Mismatched duration: A 10-word prompt for a 10-second video — the model doesn't have enough direction to fill the duration. Long videos need long prompts.
Conflicting directions: "A still, peaceful lake with dramatic waves crashing" — contradictions confuse the model. Be internally consistent.

Sample Prompts

Example 1

EXT. SMALL TOWN STREET – MORNING – LIVE NEWS BROADCASTThe shot opens on a news reporter standing in front of a row of cordoned-off cars, yellow caution tape fluttering behind him. The light is warm, early sun reflecting off the camera lens. The faint hum of chatter and distant drilling fills the air. The reporter, composed but visibly excited, looks directly into the camera, microphone in hand. Reporter (live): "Thank you, Sylvia. And yes — this is a sentence I never thought I'd say on live television — but this morning, here in the quiet town of New Castle, Vermont… black gold has been found!" He gestures slightly toward the field behind him. Reporter (grinning): "If my cameraman can pan over, you'll see what all the excitement's about." The camera pans right, slowly revealing a construction site surrounded by workers in hard hats. A beat of silence — then, with a sudden roar, a geyser of oil erupts from the ground, blasting upward in a violent plume. Workers cheer and scramble, the black stream glistening in the morning light. The camera shakes slightly, trying to stay focused through the chaos. Reporter (off-screen, shouting over the noise): "There it is, folks — the moment New Castle will never forget!" The camera catches the sunlight gleaming off the oil mist before pulling back, revealing the entire scene — the small-town skyline silhouetted against the wild fountain of oil.

Example 2

The camera opens in a calm, sunlit frog yoga studio. Warm morning light washes over the wooden floor as incense smoke drifts lazily in the air. The senior frog instructor sits cross-legged at the center, eyes closed, voice deep and calm. "We are one with the pond." All the frogs answer softly: "Ommm..." "We are one with the mud." "Ommm..." He smiles faintly. "We are one with the flies." A pause. The camera pans to the side towards one frog who twitches, eyes darting. Suddenly its tongue snaps out, catching a fly mid-air and pulling it into its mouth. The master exhales slowly, still serene. "But we do not chase the flies..." Beat. "not during class." The guilty frog lowers its head in shame, folding its hands back into a meditative pose. The other frogs resume their chant: "Ommm..." Camera holds for a moment on the embarrassed frog, eyes closed too tightly, pretending nothing happened.

Helpful Terms

Visual Details

Lighting: Flickering candles, Neon glow, Natural sunlight, Dramatic shadows
Textures: Rough stone, Smooth metal, Worn fabric, Glossy surfaces
Color Palette: Vibrant, Muted, Monochromatic, High contrast
Atmosphere: Fog, Rain, Dust, Smoke, Particles

Sound and Voice

Ambient Settings: Coffeeshop noise, Wind and rain, Forest ambience with birds
Dialogue Style: Energetic announcer, Resonant voice with gravitas, Distorted radio-style, Robotic monotone, Childlike curiosity
Volume: Whisper, Mutter, Shout, Scream

Technical Style Markers

Camera Language: Follows, Tracks, Pans across, Circles around, Tilts upward, Pushes in / pulls back, Overhead view, Handheld movement, Over-the-shoulder, Wide establishing shot, Static frame
Film Characteristics: Film grain, Lens flares, Pixelated edges, Jittery stop-motion
Scale Indicators: Expansive, Epic, Intimate, Claustrophobic
Pacing and Temporal Effects: Slow motion, Time-lapse, Rapid cuts, Lingering shot, Continuous shot, Freeze-frame, Fade-in / fade-out, Seamless transition, Sudden stop
Visual Effects: Particle systems, Motion blur, Depth of field

LTX-2.3 Prompt Guide: Tips For Prompting LTX-2.3

Core Principles

Be specific and descriptive

Describe the full scene

Use cinematic language

Describe audio when relevant

What Changed in LTX-2.3

More faithful prompt adherence

Prompt length matters

Break dialogue into segments

Audio descriptions have more impact

Key Elements to Include

1. Establish the Shot

2. Set the Scene

3. Describe the Action

4. Define the Character(s)

5. Identify Camera Movement(s)

6. Describe the Audio

For Best Results

Tips by Use Case

Text-to-Video

Image-to-Video

Audio-to-Video

What Works Well

What to Avoid

Common Mistakes

Sample Prompts

Example 1

Example 2

Helpful Terms

Categories

Visual Details

Sound and Voice

Technical Style Markers

Products

Company

Resources

Social

Legal

Legal

Related posts

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Products

Company

Resources

Social

Legal

Legal