Tutorials

How To Use AI Retake To Fix & Regenerate Specific Video Segments

Learn how to fix bad segments in AI-generated video without re-rendering the full clip using LTX-2's RetakePipeline for targeted selective regeneration.

LTX Team
Start Now
How To Use AI Retake To Fix & Regenerate Specific Video Segments
Table of Contents:
Key Takeaways:
  • LTX-2's RetakePipeline lets you regenerate a specific time window within an existing video while leaving everything outside that window untouched — fixing artifacts or motion issues without re-rolling the entire clip.
  • Video and audio tracks can be regenerated independently, and the pipeline supports both CLI and Python API access, with a hosted API endpoint available for those without a local GPU.
  • Best practice is to use short regeneration windows with prompts that match the surrounding visual context, iterate with the distilled model for speed, then finalize with the dev model for maximum quality.

AI video generation rarely produces a perfect result on the first attempt. A five-second clip might have four seconds of exactly what you wanted and one second of visual artifacts, unnatural motion, or a character suddenly shifting appearance. Without selective editing capabilities, the only option is to regenerate the entire video and hope the problematic section improves without breaking what already worked.

LTX-2 solves this with the RetakePipeline, a dedicated pipeline that lets you regenerate a specific time region of an existing video while keeping everything outside that window untouched. Instead of rolling the dice on an entirely new generation, you surgically fix the section that needs work.

This tutorial walks through how the RetakePipeline works, how to configure it via CLI and Python, the constraints you need to respect, and practical strategies for getting the most out of selective regeneration in your production workflows.

What the RetakePipeline Does

The RetakePipeline is a single-stage generation pipeline designed for one specific task: regenerating a defined time window within an existing video. It encodes the source video and audio into latents, applies a temporal region mask to mark the [start_time, end_time] range for regeneration, and denoises only the masked region from a text prompt. Content outside the time window is preserved exactly as it was in the original.

This makes it fundamentally different from generating a new video. You are not starting from scratch. You are keeping the parts that work and replacing the parts that do not.

Independent Video and Audio Control

The RetakePipeline supports independent control over video and audio regeneration through two flags: regenerate_video and regenerate_audio. You can regenerate just the video track while preserving the original audio, regenerate just the audio while keeping the video, or regenerate both simultaneously. This level of granularity is particularly useful when the visual content is fine but the audio in a specific segment needs reworking, or vice versa.

Model Compatibility

The RetakePipeline can use either the full model with CFG guidance or the distilled model with a fixed sigma schedule. When using the full model, you get stronger prompt adherence through classifier-free guidance (typical cfg_scale values of 2.0 to 5.0). The distilled model trades some prompt adherence for faster inference with 8 predefined sigmas.

Source Video Constraints

Before using the RetakePipeline, your source video must meet several specific requirements:

Frame count must satisfy the 8k+1 format. This means the total number of frames in the source video must be a value like 97, 193, or other numbers following the pattern (F-1) % 8 == 0. If your source video does not meet this constraint, you will need to trim or pad it before passing it to the pipeline.

Resolution must be multiples of 32. Both width and height of the source video must be divisible by 32. This is a standard requirement across LTX-2 pipelines and ensures proper latent space alignment.

Supported input formats. The source video should be an MP4 file with H.264 or H.265 encoding. Videos in other containers or codecs may need to be transcoded before passing them to the pipeline.

System requirement. Running the open-source pipeline locally requires CUDA 13+. Earlier CUDA versions are not supported.

These constraints exist because of how the diffusion transformer processes video latents. The temporal and spatial dimensions must align with the model's internal patchification and positional encoding structure.

Don't Have a Local GPU?

If you do not have a local GPU that meets the requirements, the LTX API provides a hosted retake endpoint (POST /v1/retake) that handles video editing server-side. The hosted endpoint uses a simpler parameter schema (no frame-count or resolution alignment to manage manually) and supports the same selective-regeneration use cases. See docs.ltx.video for details. The remainder of this tutorial focuses on the open-source pipeline.

How to Run Retake from the CLI

The RetakePipeline is available as a command-line module, which makes it straightforward to integrate into scripted workflows. Here is the basic structure of a retake command:

python -m ltx_pipelines.retake \
   --checkpoint-path path/to/checkpoint.safetensors \
   --gemma-root path/to/gemma \
   --video-path path/to/source_video.mp4 \
   --start-time 2.0 \
   --end-time 4.0 \
   --prompt "A person walking steadily through a sunlit park" \
   --output-path output_retake.mp4

The three required arguments specific to the RetakePipeline are --video-path (the source video to edit), --start-time (where the regeneration begins, in seconds), and --end-time (where it ends). Everything between start-time and end-time is regenerated from the text prompt. Everything outside that window is preserved from the source video.

You can view all available options by running:

python -m ltx_pipelines.retake --help

Using Retake Programmatically

For developers building custom pipelines or batch processing workflows, the RetakePipeline is also accessible as a local Python programmatic interface. Import the pipeline class, configure your model paths and generation parameters, and call it directly.

This section covers the local open-source pipeline only. For the hosted REST API equivalent, see POST /v1/retake at docs.ltx.video, which uses a different parameter schema (start_time and duration instead of start_time/end_time, a mode enum instead of separate regenerate_video/regenerate_audio flags, and no exposed guidance parameters).

The local pipeline accepts the same classifier-free guidance parameters as other LTX-2 pipelines through MultiModalGuiderParams. You can set cfg_scale for prompt adherence, stg_scale for spatio-temporal guidance to improve coherence, and modality_scale to control audio-visual synchronization.

When using the RetakePipeline programmatically, remember to set the PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True environment variable if you are also using FP8 quantization for lower memory footprint.

Practical Strategies for Effective Retakes

Start with Short Regeneration Windows

The narrower the time window, the more likely the regenerated segment will blend seamlessly with the surrounding content. A one-second retake is almost always more consistent than a three-second retake, because there is less room for the model to drift away from the visual context established by the preserved frames.

If a longer segment needs fixing, consider breaking it into multiple shorter retakes rather than regenerating the entire problematic region in one pass. Fix the worst section first, then evaluate whether adjacent frames still need attention.

Write Prompts That Match the Existing Video

The text prompt you provide for a retake should describe what you want to see in the regenerated segment, but it should also be consistent with the visual context of the surrounding frames. A retake prompt that describes a completely different scene, lighting condition, or character appearance will produce a jarring transition at the boundaries of the regenerated region.

For best results, write prompts that maintain the same subject, environment, lighting, and camera angle described in the original generation. Focus the prompt on correcting the specific issue: replacing unnatural motion with natural movement, fixing a character's expression, or smoothing a transition between shots.

Use Retake to Smooth Jump Cuts

One particularly effective experimental use case is smoothing transitions between separately generated clips. If you have two shots that were generated independently and you want to join them into a continuous sequence, you can concatenate the clips, identify the transition point, and use the RetakePipeline to regenerate the frames around the jump cut. The model will attempt to create a smooth visual bridge between the two shots, eliminating the jarring discontinuity.

This technique works best when the two shots share similar lighting, framing, and subject matter. The smaller the visual gap between the shots, the more natural the smoothed transition will appear.

One important gotcha: after concatenating clips, the total frame count must still satisfy (F-1) % 8 == 0. Two 97-frame clips concatenated yield 194 frames, and (194-1) % 8 = 1, which fails the constraint. Trim or pad the concatenated video to a valid frame count (such as 193 or 201) before running the RetakePipeline.

Iterate with the Distilled Model, Finalize with Dev

When experimenting with retakes, use the distilled model for faster iteration. The distilled pipeline runs with 8 predefined sigmas and provides quick feedback on whether your prompt and time window are producing the desired result. Once you are satisfied with the direction, switch to the full dev model for the final render to get maximum quality and prompt adherence.

When to Use Retake vs. Regenerating the Full Video

The RetakePipeline is the right tool when:

• Most of the video is good and only a specific time segment needs fixing

• You want to preserve exact audio-visual sync outside the edit region

• You are smoothing transitions between concatenated clips

• You need to fix a brief artifact or motion glitch without losing surrounding context

Full regeneration is the better choice when:

• The fundamental composition, lighting, or subject is wrong throughout

• You want to explore entirely different creative directions

• The source video does not meet the frame count (8k+1) or resolution (multiples of 32) constraints required by RetakePipeline

For most iterative production workflows, retake becomes the primary editing loop: generate once, review the output, identify the weakest segments, and fix them in place. This is significantly faster than regenerating entire clips and hoping the sections that were already good remain good.

Getting Started

The RetakePipeline is available in the LTX-2 open-source repository alongside all other pipelines. If you have already set up LTX-2 for text-to-video or image-to-video generation, you already have everything you need. Point the pipeline at your source video, define the time window, write a prompt, and start fixing the segments that need it.

For developers who want to understand the full pipeline architecture and how RetakePipeline fits alongside other generation modes, the artifact reduction guide covers complementary techniques for improving output quality before and after retake.

No items found.