LTX Models (LTX-2, LTX-2.3) vs CogVideoX 1.5

LTX is the open-weights enterprise model for 2026: 22B parameters, native 4K, audio-to-video, and LoRA fine-tuning at $0.04/sec. CogVideoX 1.5 is a 5B research model capped at 768p, a starting point and not a production foundation.

LTX-2.3 vs CogVideoX

LTX is the open-weights enterprise model for 2026: 22B parameters, native 4K, audio-to-video, and LoRA fine-tuning at $0.04/sec. CogVideoX 1.5 is a 5B research model capped at 768p, a starting point and not a production foundation.

CogVideoX 1.5

Developer

Lightricks
Zhipu AI (ZAI)

Parameters

22B
5B

Open Source

Yes
Yes

On-Prem

Yes (self-host)
Yes (self-host)

OUTPUT QUALITY

Native 4K Rendering

Yes (3840×2160)
No (768p / 1360×768)

Max Video Length

20 sec (Fast) / 10 sec (Pro)
~10 sec (161 frames @ 16fps)

Frame Rate (fps)

Up to 50 fps
16 fps

SPEED & COST

8 sec FHD Generation Time

~15 sec (H100 cloud)
~1 min (H100)

API Pricing
(per second of video)

$0.04/sec (Fast 1080p) $0.06/sec (Pro 1080p) $0.16/sec (Fast 4K) $0.24/sec (Pro 4K)
~$0.02/sec ($0.20/10-sec via fal.ai)

Free Access

Yes – open-source + free Desktop app
Yes – self-host open weights

Subscription Plans
(non-API access)

Free (self-host & Desktop)
Free (self-host)

CAPABILITIES

Text-to-Video

Yes
Yes

Image-to-Video

Yes
Yes

Retake

Yes (LTX Retake)
No

HDR Output

Yes
No

Extend

Yes
No

LipDub

Yes
No

Audio-to-Video

Yes – native multimodal
No

Multi-modal Inputs
(text + image + audio + video)

All four
Text + Image

Motion Control

Yes – full control
Limited

Character Consistency

Yes – via LoRA fine-tuning
Limited

Content Moderation / Limits

No limits (open source)
No limits (open source)

DEVELOPER & ENTERPRISE

LoRA / Fine-tuning

Yes – LoRA + IC-LoRA
Yes – CogKit

Fully Customizable

Yes
Yes

Runs on Consumer-Grade GPUs

Yes
Yes

ComfyUI / Diffusers Support

Yes
Yes

SUMMARY

Best For

Enterprise teams needing on-prem deployment, full model customization, IP protection, and zero marginal cost at scale
Developers fine-tuning lightweight open-source models on modest hardware
//

Customer Voices

Success, Engineered Together

"For professional studios, this level of control is not optional.
Training and steering video models like LTX is the most viable way to align AI with real production needs, where predictability, ownership, and creative intent matter as much as visual quality"
Mohamed Oumoumad
CTO, Gear Productions
//

The LTX Stack

Build, Create, and Scale with LTX

Production-grade video generation models designed to hold up under real workloads. Built for long sequences, precise motion, and high-fidelity output  from fast iteration to final-quality renders. Learn More →

HDR Output

Delivered as an IC-LoRA on LTX-2.3. Generate directly in HDR or convert existing SDR footage to EXR. More grading latitude, more range, ready for real finishing pipelines.

Native Portrait

Generate vertical video up to 1080×1920 — trained on portrait-orientation data, not cropped from landscape.

Audio to Video

Generate video where voice, music, and sound effects define structure, pacing, and motion.Built for production-grade workflows that require precise, harmonious control over audio-led scenes - from podcasts and avatars to voice-driven clips -not one-off demos or talking heads.

20 sec Clip

Extend creative range with long-form generation. Produce up to 20 seconds of high-fidelity video with complete control and consistent style.

Which model is best for my business?

Subtext here if needed

LTX is best for:

  • Bullet point 1 — LTX is best for this use case
  • Bullet point 2 — LTX is best for this use case
  • Bullet point 3 — LTX is best for this use case
  • Bullet point 4 — LTX is best for this use case
  • Bullet point 5 — LTX is best for this use case

CogVideoX 1.5 is best for:

  • Bullet point 1 — Competitor is best for this use case
  • Bullet point 2 — Competitor is best for this use case
  • Bullet point 3 — Competitor is best for this use case