Tutorials

Lipsync vs. LipDub in LTX-2.3: Differences & When To Use Each

LTX-2.3 lipsync generates talking video from audio. LipDub redubs existing footage via IC-LoRA. Here's when to use each in production.

Rachel Luxemburg
Start Now
Lipsync vs. LipDub in LTX-2.3: Differences & When To Use Each
Table of Contents:
Key Takeaways:
  • LTX-2.3 has two distinct talking-character tools: lipsync generates new video synchronized to an audio input, while LipDub uses an IC-LoRA adapter to redub existing footage with new dialogue from a text prompt while preserving the speaker's appearance and scene context.
  • The core distinction is creation vs. transformation — lipsync when no footage exists yet, LipDub when you have locked footage and need to change what the character is saying.
  • In production, the two tools work sequentially: use lipsync to create the initial scene, then LipDub to revise dialogue or produce localized language versions without regenerating the visual performance.

LTX-2.3 has two ways to make characters speak. One takes audio. The other takes text. They use different pipelines and serve different production stages, but are frequently confused.

What Is Lipsync?

Lipsync is a native capability of LTX-2.3. The model generates new video with mouth movements synchronized to an audio input, using a standard audio-to-video pipeline. No adapters or LoRAs required.

You provide an audio clip (and optionally a reference image or text prompt), and LTX-2.3 generates a new video of a character speaking those words with synchronized lip movements. 

What’s key is that lipsync generates entirely new content. There is no pre-existing video. The entire output is generated by the model from the inputs provided.

What Is LipDub?

LipDub is a specialized IC-LoRA adapter for LTX-2.3 that redubs existing footage. It replaces the spoken dialogue but preserves the speaker's appearance and the scene context.

You provide the video to be dubbed and a text prompt containing the new dialogue. The model generates a new version of that video where the character's lip movements match the new dialogue. Identity, background, and camera angle are preserved as much as possible.

What’s key is that LipDub transforms existing content. The input video is the creative anchor; the IC-LoRA’s job is to change the dialogue and disturb as little else as possible.

The Big Question: "Can I Use LipDub with My Own Audio?"

No. This is the question we hear most, and it's the root of nearly all the confusion between the two features.

LipDub is text-driven. You place new dialogue in the prompt, and the model generates both video and audio jointly. The reference video’s audio is only used for voice conditioning, not as the dialogue input.

If you have an audio file that you want to use for a character, that's lipsync. And if you need precise control over the final audio (for instance, using a specific TTS voice), use LipDub for the visual output and handle audio mixing in post.

When to Use Each

The decision comes down to whether you're creating new footage or transforming existing footage.

Use lipsync when:

  • You're generating a talking-head video from scratch 
  • You have a voiceover or dialogue track and need to create a character speaking those words
  • You're building scenes where no reference footage exists yet
  • You want the model to have full creative freedom over the visual output

Use LipDub when:

  • You have existing footage of a character speaking and need to change what they're saying
  • You're translating dialogue into another language and need the lips to match
  • You're doing dialogue corrections on footage that's otherwise final
  • You need to preserve the visual identity, scene context, and camera work of the original

How They Work Together

In a production pipeline, lipsync and LipDub are tools for different stages of the process.

Consider a multilingual short film workflow. You might use lipsync during initial production to generate establishing shots of characters speaking, creating the "hero" footage in the primary language. Once that footage is locked, you'd run LipDub to produce localized versions: same visual performance, different language.

Or take an iterative creative workflow. You generate an initial scene with lipsync, review it, and decide the delivery is right but the dialogue needs a rewrite. Rather than regenerating the entire scene and losing the visual performance you liked, you run LipDub on your existing output to swap in the revised lines.

The pattern is: lipsync to create, LipDub to revise.

Getting Started

Lipsync:

LipDub:

A note on audio output: LipDub generates new audio conditioned on the reference video's audio characteristics. It does not pass through the original track. If you need exact preservation of your original background music, SFX, or ambient audio, separate those beforehand and remix in post.

No items found.