How to Use IC-LoRA in LTX-2

Master motion control in AI video generation with IC-LoRA. Learn how to transfer camera movement, scene structure, and human performance from reference videos into LTX-2 workflows.

Rachel Luxemburg

February 17, 2026

Get API key

Table of Contents:

Key Takeaways:

‍

IC-LoRA (In-Context LoRA) transfers structure and motion from reference videos into LTX-2 generations, enabling precise control over camera movement, scene geometry, and human performance without relying on text prompts.
Three IC-LoRA modes: Canny (edge preservation), Depth (camera/3D geometry), Pose (human motion)
Works in both image-to-video and text-to-video modes
Motion comes from reference video; prompts control visual style only
Run one IC-LoRA group at a time to avoid memory issues
Similar to ControlNet for images, but applied frame-by-frame across video
Best results require aligning the first frame with reference video in I2V mode‍
Quick workflow: Load reference video → Choose IC-LoRA mode (Canny/Depth/Pose) → Write style-focused prompt → Generate with motion locked to reference.

IC-LoRA (In-Context LoRA) is a powerful control mechanism in LTX-2 that separates motion from visual styling. Instead of describing camera movements or character actions in text prompts, IC-LoRA extracts structure and motion directly from reference videos and applies them to entirely new generations.

This tutorial walks through the official LTX-2 IC-LoRA workflow in ComfyUI, explaining when to use each IC-LoRA mode, how the preprocessing pipeline works, and best practices for high-quality motion transfer.

What Is IC-LoRA?

IC-LoRA (In-Context LoRA) transfers the structure and motion from a reference video onto a completely new generation while allowing you to reimagine the visual style through text prompts.

The workflow enables you to:

Lock camera movements from reference footage
Preserve scene geometry and spatial relationships
Transfer human motion and performance to different visual styles
Maintain temporal consistency across frames

This approach is conceptually similar to ControlNet for images, but applied across multiple frames in video.

The Three IC-LoRA Modes

The LTX-2 IC-LoRA workflow includes three separate IC-LoRA groups, each designed for different types of structural control.

Canny IC-LoRA: Edge Preservation

What it extracts: Edge maps and outlines from the reference video

Best for:

Preserving shot composition and framing
Maintaining silhouettes and structural boundaries
Stylized outputs where clean shapes matter more than photorealism

Use case: When you need to keep the compositional structure of a shot while changing everything else—ideal for animation, illustration styles, or abstract visuals.

Depth IC-LoRA: Camera Movement and 3D Geometry

What it extracts: Depth maps and spatial scene geometry

Best for:

Camera movement (pan, tilt, dolly, crane shots)
3D structure and spatial relationships
Parallax and depth-aware motion

Use case: Cinematic shots where camera motion is critical—tracking shots, establishing shots, or any scenario requiring precise camera control and spatial coherence.

Pose IC-LoRA: Human Motion Transfer

What it extracts: Skeleton joints and body movement using DWPose

Best for:

Dance choreography
Complex body movements
Athletic or action sequences
Performance capture

Use case: Transferring human motion to different characters or visual styles while preserving the exact timing and performance of the original—perfect for stylized character animation or motion retargeting.

Workflow Setup: Loading Reference Assets

Step 1: Load Reference Video and Image

The workflow requires two inputs:

Load Video node:
Upload your reference video containing the motion you want to transfer.

Reference Image:
Even when running in text-to-video mode, an image input is required for the workflow to execute.

Step 2: Choose Generation Mode

The workflow supports two modes controlled by the text-to-video switch (set to false by default):

Image-to-Video Mode (switch = false)

Uses a custom-generated image as the starting point
Animates from that image while following the reference video's motion
Critical: For best results, generate the first frame using ControlNet or another image model derived from your reference video's first frame
Misalignment between your starting image and reference video can cause jump cuts or visual artifacts

Text-to-Video Mode (switch = true)

Generates all video pixels from scratch based on your prompt
Still strictly adheres to the structure and motion of the reference video
No image alignment required, but motion control comes entirely from the reference

Prompting Strategy for IC-LoRA

Since motion is handled by the IC-LoRA itself, your prompting approach must change. For comprehensive prompting techniques and examples tailored to LTX-2, refer to the LTX-2 Prompting Guide.

What to Include in Prompts

Focus on visual characteristics only:

Visual style and aesthetic (e.g., "cinematic noir lighting", "watercolor painting style")
Subject description (character appearance, clothing, features)
Background elements (environment, scenery, objects)
Textures and materials (surface qualities, finishes)
Lighting (mood, time of day, color grading)

What NOT to Include

Avoid describing motion:

"Camera pans left"
"Character walks forward"
"Slow zoom in"

Motion is automatically derived from the reference video. Describing it in the prompt creates conflicting instructions and degrades output quality.

Inside the IC-LoRA Workflow Architecture

Most of the workflow is packaged into subgraphs for convenience, but understanding the internal structure helps with customization.

Core Components (Using Pose IC-LoRA as Example)

Model Loading:

IC-LoRA loader – Loads the specific IC-LoRA (Canny, Depth, or Pose)
LTX-2 checkpoint – Base video generation model
LTX upsampler model – Handles resolution upscaling
Gemma CLIP encoder – Processes text prompts
Audio VAE – Decodes audio latents

The Preprocessing Pipeline

Step 1: Video Preprocessing

The reference video passes through a mode-specific preprocessor:

Canny mode: Edge detection preprocessor
Depth mode: Depth map extraction
Pose mode: DWPose (extracts skeleton and joint positions)

This preprocessing extracts the structural information that will guide generation.

Step 2: Prompt Enhancement

In text-to-video mode, prompts pass through the custom prompt enhancer node, which includes an elaborate system prompt for refinement.

In image-to-video mode, this node is bypassed by default.‍

Best practice: Take the enhanced prompt and manually adjust it, especially if you want your character to say a specific phrase or need precise control over audio content.

The LTXV Add Guide Node: Control Center

This node is your primary control interface for IC-LoRA behavior.

Key parameters:

Frame index: Set to 0 by default, ensuring guidance starts from the very beginning of the video.
‍Strength: Set to 1.0 by default (full strength).
Important: Values below 1.0 can cause the reference video to "pop" or bleed through in the output. Generally, you don't want to change this unless debugging specific issues.

Multi-Stage Sampling

Like the distilled image-to-video workflow, IC-LoRA uses a two-stage sampling process:‍

Stage 1: Low-Resolution Generation

Generates at lower resolution first
Uses Euler ancestral sampler with 8 sigma steps
Faster iteration and lower VRAM usage

Stage 2: Upscaling

Upscales to final target resolution
Refines details while maintaining motion consistency

This multiscale approach balances quality and performance.

Best Practices for IC-LoRA Workflows

Memory Management

Run one IC-LoRA group at a time:
Running multiple IC-LoRA modes simultaneously (e.g., Canny + Depth + Pose) will cause VRAM issues. Choose the mode that best fits your control needs and mute the other two groups.

Image-to-Video Alignment

Generate first frame from reference video

For I2V mode, use ControlNet or another image generation model to create a first frame derived from your reference video's first frame. This prevents:

Jump cuts at video start
Visual discontinuities
Temporal artifacts

Prompt Customization

Manually adjust enhanced prompts:
While the prompt enhancer is helpful, manually refining the output gives you better control - especially for:

Specific dialogue or audio content
Precise visual details
Style consistency

Keep Default Strength Values

The default guidance strength of 1.0 is optimized for most use cases. Only adjust if you're experiencing specific issues with reference video bleed-through.

Understanding the Output

The final result is a video that:

Completely reimagines the visual style based on your prompt
Locks in the original motion perfectly from the reference video
Maintains temporal consistency across frames
Synchronizes audio and video output
This separation of motion and style is what makes IC-LoRA a production-ready control mechanism—you get deterministic motion behavior with creative freedom over aesthetics.

Getting Started with IC-LoRA

Quick Start Checklist:

Download the IC-LoRA workflow from the LTX Models repository
Prepare a clean reference video with the motion you want to transfer
Choose your IC-LoRA mode based on control type needed:
- Canny for composition/edges
- Depth for camera/3D structure
- Pose for human motion
Load reference assets (video + image)
Write a style-focused prompt (avoid motion descriptions)
Set text-to-video mode (true/false) based on your use case
Generate and iterate

Next Steps:

Experiment with all three IC-LoRA modes to understand their strengths
Train custom IC-LoRAs using the LTX video trainer repository
Build motion reference libraries for repeatable production workflows
Combine with standard LoRAs for additional style control

Why IC-LoRA Matters for Production Workflows

IC-LoRA represents a fundamental shift in AI video control—moving from unpredictable prompt-based motion to deterministic, reference-driven motion transfer.

Key advantages:

Repeatability: The same reference video produces consistent motion across generations
Temporal coherence: Frame-to-frame consistency improves dramatically
Separation of concerns: Motion and style become independent variables
Production scalability: Build libraries of motion references for systematic reuse

This makes LTX-2 with IC-LoRA suitable for professional workflows where consistency and predictability matter more than creative serendipity.

Conclusion

IC-LoRA unlocks precise motion control in LTX-2 by separating motion from visual styling. Whether you need to preserve edges with Canny, control camera movement with Depth, or transfer human performance with Pose, IC-LoRA gives you deterministic, repeatable results that work in production environments.

The key to success: let the reference video handle motion, and use your prompts purely for visual style. This separation makes AI video generation more predictable, consistent, and production-ready.

Ready to get started? Download the IC-LoRA workflow, choose your control mode, and start experimenting. For advanced users, explore the LTX video trainer to create custom IC-LoRAs tailored to your specific motion control needs.

How to Use IC-LoRA in LTX-2

What Is IC-LoRA?

The Three IC-LoRA Modes

Canny IC-LoRA: Edge Preservation

Depth IC-LoRA: Camera Movement and 3D Geometry

Pose IC-LoRA: Human Motion Transfer

Workflow Setup: Loading Reference Assets

Step 1: Load Reference Video and Image

Step 2: Choose Generation Mode

Image-to-Video Mode (switch = false)

Text-to-Video Mode (switch = true)

Prompting Strategy for IC-LoRA

What to Include in Prompts

What NOT to Include

Inside the IC-LoRA Workflow Architecture

Core Components (Using Pose IC-LoRA as Example)

The Preprocessing Pipeline

The LTXV Add Guide Node: Control Center

Key parameters:

Multi-Stage Sampling

Best Practices for IC-LoRA Workflows

Memory Management

Image-to-Video Alignment

Prompt Customization

Keep Default Strength Values

Understanding the Output

Getting Started with IC-LoRA

Why IC-LoRA Matters for Production Workflows

Conclusion

Products

Company

Resources

Social

Legal

Legal