How to Run a Video Generation Model Locally: A Low VRAM Guide

Set up local AI video generation on consumer GPUs. Covers VRAM tiers, pipeline selection, FP8 quantization, and ComfyUI workflows.

LTX Team

May 19, 2026

Start Now

Table of Contents:

Key Takeaways:

Local LTX-2.3 video generation requires CUDA 13+ and an Nvidia GPU, with the distilled model (DistilledPipeline) as the recommended starting point for hardware under 80GB VRAM due to its 8-step inference and lower memory footprint than the dev checkpoint.
FP8 quantization (fp8-cast for all GPUs, fp8-scaled-mm for Hopper) is the primary lever for reducing VRAM usage, enabling the distilled model to run on 16-24GB consumer GPUs — always set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True before any pipeline run.
Use the incremental scaling approach: validate at minimum settings first (low resolution, 49 frames, fp8-cast), then increase resolution and frame count one step at a time to find the highest stable configuration for your hardware.

Running AI video generation locally gives you full control over your workflow, no API costs, and no rate limits. The trade-off is hardware: video generation models are memory-intensive, and running them on consumer hardware requires navigating VRAM constraints, optimization settings, and the right pipeline choices.

This guide covers how to set up local AI video generation using LTX-2.3, an open source DiT-based model with multiple pipeline variants optimized for different hardware tiers. It focuses specifically on low-VRAM scenarios, including quantization options and the distilled model for faster, lighter inference.

Prerequisites

Hardware Requirements

LTX-2.3 requires CUDA 13+ and targets Nvidia GPUs. The full dev model at default precision requires 80GB+ VRAM for comfortable operation. For lower-VRAM GPUs:

• 32GB VRAM: Run the dev model with FP8 quantization enabled

• 16-24GB VRAM: Run the distilled model with FP8 quantization, single IC-LoRA groups

• 8-12GB VRAM: Run the distilled model at reduced resolution with aggressive memory optimizations

Community members have explored running LTX-2.3 on GPUs with less VRAM (24GB cards like the RTX 4090) with specific settings. The incremental approach below walks through how to find stable settings for your hardware.

Software Dependencies

LTX-2.3 requires CUDA 13+ and runs on Python via uv for dependency management. Linux is the primary supported environment. Windows and macOS are not officially supported but community workarounds exist.

Installation

1. Clone the repository: git clone https://github.com/Lightricks/LTX-2.git

2. Set up the environment: cd LTX-2 && uv sync --frozen && source .venv/bin/activate

3. Set the memory environment variable before running any pipeline:

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Download model files from HuggingFace. Required files depend on which pipeline you use:

• Distilled checkpoint: ltx-2.3-22b-distilled.safetensors

• Spatial upsampler: Required for two-stage pipeline outputs

• Gemma text encoder: Required for all pipelines

• Dev checkpoint: ltx-2.3-22b-dev.safetensors (only if running the full dev model)

Choosing the Right Pipeline for Your VRAM

LTX-2.3 provides multiple pipelines optimized for different speed-quality trade-offs:

TI2VidTwoStagesPipeline: Two-stage pipeline using the dev model. Highest quality output. Requires 80GB+ VRAM unquantized or 32GB+ with FP8.

DistilledPipeline: Single checkpoint with 8 predefined sigma steps. Fastest inference. Runs on 16GB+ with quantization.

ICLoraPipeline: Video-to-video with IC-LoRA structural control (Pose, Depth, Edge). Distilled model only. Additional memory for IC-LoRA adapters.

For low-VRAM setups, start with the DistilledPipeline. It uses significantly less memory than the dev pipeline and produces good results for most use cases.

FP8 Quantization

FP8 quantization reduces the transformer's memory footprint by representing weights in 8-bit floating point. LTX-2.3 supports two FP8 backends:

• fp8-cast: Downcasts weights on load, upcasts during inference. Works on any GPU with FP8 support. Enable with --quantization fp8-cast

• fp8-scaled-mm: Uses TensorRT-LLM scaled matrix multiplication for better performance on Hopper (H100) GPUs. Requires tensorrt_llm package

Enable via CLI:

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python -m ltx_pipelines.ti2vid_two_stages --quantization fp8-cast --checkpoint-path /path/to/checkpoint.safetensors

ComfyUI Integration

For a visual, node-based workflow, LTX-2.3 integrates with ComfyUI through the official ComfyUI-LTX plugin. The plugin exposes LTX-2.3's generation pipelines as nodes in the ComfyUI graph interface, which is often more accessible than command-line operation for building video generation workflows. LTX-2.3 integrates via the official ComfyUI-LTX plugin repository. The plugin adds nodes for all LTX-2.3 pipeline types, including text-to-video, image-to-video, audio-to-video, and IC-LoRA conditioning.

LTX Desktop

LTX Desktop is a standalone GUI application for local AI video generation. It wraps the core LTX-2.3 pipelines in a graphical interface, handling model management, pipeline configuration, and output organization without requiring command-line operation. For users who prefer a desktop application over CLI or ComfyUI, LTX Desktop provides direct access to the same generation capabilities with a lower setup overhead.

Incremental Scaling Approach for Low-VRAM Hardware

When working with constrained VRAM, start minimal and scale up incrementally rather than attempting full settings immediately:

1. Start minimal: Begin with the distilled model at low resolution (512 pixels, 49 frames). Use --quantization fp8-cast and the memory environment variable. If this generates successfully, proceed.

2. Increase resolution: Move to your target resolution one step at a time (512 → 720 → 1080). Verify stability at each level before continuing.

3. Increase frame count: Once resolution is stable, increase frame count toward your target. Valid frame counts must satisfy (F-1) % 8 == 0.

4. Add IC-LoRA: If using IC-LoRA (Pose, Depth, Edge), add one LoRA group at a time. Multiple simultaneous groups on lower-VRAM GPUs may cause OOM errors.

Troubleshooting OOM Errors

Out-of-memory errors during generation usually have a clear cause. Common fixes:

• Ensure PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True is set

• Enable FP8 quantization with --quantization fp8-cast

• Reduce resolution or frame count

• Switch from the dev to the distilled checkpoint

• If using IC-LoRA, reduce to a single LoRA group

• Close other GPU-memory-intensive applications

Conclusion

Local video generation with LTX-2.3 is accessible on consumer hardware with the right settings. Start with the distilled model, enable FP8 quantization, and use the incremental approach to find the highest stable settings for your GPU. For a comprehensive breakdown of VRAM tiers and specific hardware recommendations, see the hardware guide.

How to Run a Video Generation Model Locally: A Low VRAM Guide

Prerequisites

Hardware Requirements

Software Dependencies

Installation

Choosing the Right Pipeline for Your VRAM

FP8 Quantization

ComfyUI Integration

LTX Desktop

Incremental Scaling Approach for Low-VRAM Hardware

Troubleshooting OOM Errors

Conclusion

Products

Company

Resources

Social

Legal

Legal

Related posts

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Products

Company

Resources

Social

Legal

Legal