- AI video models default to slow motion because gradual frame-to-frame changes are easier to maintain consistently — fixing it requires specific action verbs, environmental motion cues, and explicit camera movement descriptions in your prompt.
- For open-source pipeline users, lowering
cfg_scaleto 2.0–2.5 andstg_scaleto 0.5–0.8 gives the model more freedom to generate fast frame-to-frame motion without sacrificing temporal coherence. - Prompt engineering applies to all users; guidance parameter tuning is only available when running the open-source pipeline locally or via ComfyUI — hosted API users rely on prompting alone.
You write a prompt describing a person walking down a busy street, and the output looks like a slow-motion dream sequence. Everything moves at half speed: pedestrians glide, cars drift, even the wind seems hesitant. The scene is technically coherent, the composition is solid, but the pacing feels wrong.
This is one of the most common issues in text-to-video AI generation. Diffusion models tend to favor slow, smooth motion because it is easier to maintain temporal consistency across frames. Faster motion means more pixel-level change per frame, which increases the chance of artifacts. So the model defaults to the safest option: make everything slow.
The fix is a combination of prompt engineering and (for open-source users) guidance parameter tuning. Both work together, and getting either one wrong can undo the other. Prompt techniques apply to all LTX-2 users; parameter tuning is available only when running the open-source pipeline locally or programmatically.
Why AI Video Defaults to Slow Motion
LTX-2 uses 3D RoPE positional encoding across 48 shared transformer blocks to maintain temporal consistency between frames. This architecture is what makes generated video look like video rather than a slideshow of loosely related images. But it also creates a bias: the model learns that slow, gradual transitions between frames produce the most consistent results during training.
When your prompt is vague about motion, the model has no reason to push toward faster movement. A prompt like "a woman walks down a city street" gives the model permission to interpret "walks" at any speed. It will almost always choose slower, because slower is safer.
The second factor is guidance strength. Higher Classifier-Free Guidance (cfg_scale) values constrain the output more tightly to the prompt embedding, but they also reduce the model's freedom to generate varied motion between frames. The result: strong prompt adherence with sluggish movement.
Prompt Techniques for Faster, Natural Motion
The official LTX-2 prompting guide recommends including key elements in this order: establish the shot, set the scene, describe the action with specific motion cues, define the characters, identify camera movement, and describe the audio. Keep descriptions literal and precise. Think like a cinematographer describing a shot list. For fast-paced scenes specifically, you can lead with the action to push the model toward energetic pacing from the first frame, but the documented order starts with framing and scene context before the action itself.
That principle of specificity is the key to fixing slow motion. A cinematographer does not write "person walks." They write "a woman strides briskly through a crowded intersection, weaving between pedestrians." The difference in specificity directly affects the output.
Use Action Verbs with Speed Cues
Replace static or ambiguous verbs with verbs that encode speed. "Walks" becomes "strides," "hurries," or "rushes." "Moves" becomes "darts," "sweeps," or "charges." Adding temporal modifiers reinforces the intent: "quickly," "at full speed," "in rapid succession."
Before: "A car drives along a highway at sunset."
After: "A red sports car speeds along a desert highway at golden hour, dust trailing behind it as it overtakes slower traffic. The camera tracks alongside at matching velocity."
The second prompt gives the model three motion cues: "speeds," "overtakes," and "tracks alongside at matching velocity." Each one pushes the generation toward faster movement.
Describe Environmental Motion
Subject motion alone is not enough. If your scene describes a person running but the environment is static, the model may generate a slow-moving runner against a frozen background. Describe the environment as if it is also in motion: trees blurring past, reflections streaking on wet pavement, crowds parting.
Before: "A cyclist rides through a park."
After: "A cyclist pedals rapidly through a tree-lined park path, leaves scattering in the wake. Joggers and dog walkers pass in the background. The camera follows from a low angle, tracking the spinning wheels."
Specify Camera Movement Explicitly
Camera descriptions are powerful pacing cues. A static camera implies a still, observational scene. A tracking shot implies movement. A handheld camera implies energy. LTX-2 ships with camera control LoRAs (Dolly-In, Dolly-Out, Dolly-Left, Dolly-Right, Jib-Up, Jib-Down, and Static), but even without LoRAs, describing camera motion in the prompt nudges the model toward faster pacing.
Note that the current camera control LoRAs are released for the LTX-2-19b model variant. If you are running the LTX-2.3-22b checkpoint, check the HuggingFace repository for 22B-compatible versions before applying them. The prompt-level camera language works regardless of LoRA availability.
Include camera movement type (pan, track, dolly, crane), direction (left to right, pulling back, pushing in), and speed (slow pan vs. rapid tracking shot). The more specific you are, the less the model defaults to a static, slow composition.
Guidance Parameters That Affect Motion Speed
The following guidance parameters are available when using the open-source LTX-2 pipeline locally or programmatically. If you are using the hosted API at docs.ltx.video, motion pacing is controlled primarily through prompt engineering, as the API handles guidance parameters internally. ComfyUI users can access these same parameters through their node interface.
Prompt writing gets you most of the way there. But if your guidance parameters are too aggressive, they can clamp down on motion regardless of what the prompt says.
CFG Scale: The Motion Freedom Dial
The cfg_scale parameter controls how strongly the output adheres to the text prompt. Typical values range from 2.0 to 5.0. For motion-heavy scenes, staying toward the lower end of that range (2.0 to 3.0) gives the model more freedom to generate varied frame-to-frame motion. Higher values (4.0 to 5.0) produce stronger prompt adherence but can constrain the motion range, making everything feel slower and more deliberate.
If your video looks correct but moves too slowly, try reducing cfg_scale by 0.5 to 1.0 before rewriting the prompt. This single change often resolves the issue.
STG Scale: Temporal Coherence vs. Motion Range
The stg_scale parameter (Spatio-Temporal Guidance) improves temporal coherence by perturbing specific transformer blocks during inference. Values between 0.5 and 1.5 are typical, with the default targeting block 29 (the last transformer block).
Higher stg_scale values produce more consistent frame-to-frame results, but they also reduce the acceptable range of motion between frames. For fast-moving scenes, try values closer to 0.5. For scenes where you need both speed and strict consistency (like a camera dolly through architecture), balance at around 0.8 to 1.0.
Recommended Starting Points
High-motion scene (action, sports, fast camera movement): cfg_scale 2.0 to 2.5, stg_scale 0.5 to 0.8, rescale_scale 0.5
Moderate-motion scene (walking, conversation, gentle camera pan): cfg_scale 3.0, stg_scale 1.0, rescale_scale 0.7
Low-motion scene (landscape, still life, slow reveal): cfg_scale 3.5 to 4.0, stg_scale 1.0 to 1.5, rescale_scale 0.7
These are starting points. Adjust based on the specific content. The moderate-motion values match the documented example configuration in the LTX-2 docs; the high-motion and low-motion values are practical suggestions extrapolated from the documented parameter behavior, not officially documented presets. The rescale_scale parameter prevents over-saturation by matching the guided prediction to conditional prediction variance, which becomes more important at lower cfg_scale values where the output has more freedom to drift.
Common Pacing Mistakes and How to Fix Them
Prompt describes a fast scene but output is slow: Check cfg_scale first. Values above 4.0 constrain motion. Drop to 2.5 and regenerate before rewriting the prompt.
Parts of the scene move fast but the subject is slow: Your prompt likely describes environmental motion but not subject motion. Add explicit action verbs for the subject: "sprints," "dashes," "whips around."
Motion is fast but looks unnatural: stg_scale may be too low, causing frame-to-frame inconsistency. Increase to 0.8 to 1.0 for better temporal coherence without killing speed.
Camera movement feels sluggish: Describe camera velocity explicitly in the prompt. "The camera rapidly tracks left" is better than "the camera follows." Consider using camera control LoRAs (Dolly-In/Out/Left/Right, Jib-Up/Down) for precise camera motion control — noting again that current versions are 19b.
Everything looks like slow-motion replay: Your prompt may be too short or too abstract. The LTX-2 prompting guide recommends aiming for 4-8 descriptive sentences. Make every sentence count, and include at least one motion cue per sentence.
Slow Motion Fix: Common Questions
Does frame rate affect perceived motion speed? The frame rate (typically 25 FPS for LTX-2) determines playback smoothness, not motion speed. A video generated at 25 FPS with slow motion will still look slow. The motion speed is determined by how much pixel-level change the model generates between consecutive frames, which is controlled by prompting and guidance parameters.
Can prompt enhancement help with pacing? LTX-2 pipelines support automatic prompt enhancement via the enhance_prompt parameter. This can add detail to sparse prompts, but it may not add motion-specific cues. For pacing control, write your own detailed prompts rather than relying on enhancement alone.
Should I use the Dev or Distilled model for fast motion? Both models respond to the same prompt techniques. The Dev model offers more control through guidance parameters (CFG, STG), which makes it better for fine-tuning motion pacing. The Distilled model uses a fixed sigma schedule with fewer steps and no guidance, so motion characteristics depend more heavily on the prompt itself.
Getting Natural Pacing Right
Fixing slow motion in AI-generated video comes down to two things: telling the model exactly how fast things should move, and (when working in the open-source pipeline) giving it the parameter space to execute. Write prompts that encode speed at every level: subject action, environmental motion, and camera movement. Then set guidance parameters that allow the model room to generate frame-to-frame variation instead of clamping everything to a safe, slow default. API users can rely on the prompting half of this playbook; open-source and ComfyUI users get both halves.
For artifact reduction techniques that complement fast-motion generation, and for the full LTX-2.3 prompt guide, check the linked resources.
.jpeg)