News

Creating Camera Control With AI Video: Master Dolly, Pan, Jib, and Focus Motion

Learn how to use camera control LoRAs in LTX-2 to add cinematic movement to AI-generated video. Step-by-step guide with code examples for dolly, pan, jib, and focus motion.

LTX Team
Get API key
Creating Camera Control With AI Video: Master Dolly, Pan, Jib, and Focus Motion
Table of Contents:
Key Takeaways:
  • Camera control LoRAs are lightweight adapters that teach LTX-2 and LTX-2.3 to execute specific, repeatable camera movements — dolly, pan, tilt, jib, and rack focus — instead of leaving motion up to the model's interpretation.
  • Stacking multiple LoRAs unlocks complex cinematic choreography: pan + dolly creates tracking shots, dolly + rack focus creates dramatic reveals, and sequential jib + pan creates spatial storytelling.
  • Camera movement is a creative language — a dolly forward draws the viewer in, a rack focus redirects attention, a jib up reveals scale — and LoRAs let you direct that language with precision.

If you've experimented with AI video generation, you've probably noticed something: the camera tends to sit still. Your prompts generate compelling subjects, dynamic scenes, creative compositions—and then the lens just... waits.

This is where camera control LoRAs change the game. They're not magic, and they're not complicated.

They're a lightweight fine-tuning technique that teaches your LTX-2 and LTX-2.3 models to understand and execute specific camera movements. Dolly shots. Pan and tilt motions. Jib movements. Even rack focus.

In this guide, we'll walk through what LoRAs are, how they work in LTX-2's motion guidance pipeline, and exactly how to stack and combine them to create cinematic video.

You'll see practical code examples for each movement type, and we'll cover real workflows where camera control actually matters.

By the end, you'll be able to generate video with precise, repeatable camera motion—whether you're building a product showcase, a cinematic landscape pan, or a dynamic interview setup.

And you'll understand why this matters for anyone doing serious video generation work.

Understanding LoRAs in LTX-2

What Are LoRAs and Why They Matter

LoRA stands for Low-Rank Adaptation. In plain terms: it's a lightweight way to teach a model new skills without retraining the whole thing from scratch.

Think of it like muscle memory. Your LTX-2 base model is like a classical pianist who can play any composition.

A LoRA is like intensive practice on a specific technique—how to execute a flawless Rachmaninoff run, or how to nail a particular improvisation style. The pianist doesn't forget their classical training. They just develop a specialized response pattern for that specific input.

The same principle applies to LoRAs in video generation.

Your LTX-2 model already knows how to generate video. A camera control LoRA teaches it a specific pattern: when you send motion guidance parameters that look like "pan left," the model learns to respond with a camera movement that matches that intent.

Why does this matter? Because without LoRAs, you're limited to whatever motion the base model infers from your text prompt. With LoRAs, you have direct, repeatable control.

Your pan will look like a pan every time. Your dolly will track forward with consistent velocity. You're not gambling on the model's interpretation—you're specifying the behavior.

How LoRAs Fit Into LTX-2's Architecture

LTX-2 accepts motion guidance as structured input—think of it as a motion grammar that describes camera behavior, object movement, and temporal flow. LoRAs live in that guidance layer.

They're small neural network adapters that sit alongside the main model and specialize in translating high-level motion intent into low-level guidance signals.

When you load a camera control LoRA, you're essentially loading a 50-100MB file that contains learned patterns.

The model says: "When the user provides this parameter pattern, I should generate video that looks like this." No retraining. No fine-tuning the whole model. Just a small, targeted injection of knowledge.

The Four Essential Camera Movements

1. Dolly (Forward/Backward Movement)

A dolly shot moves the camera forward or backward through the scene. It creates a sense of spatial exploration—you're not just zooming into the subject, you're moving into their space.

In LTX-2 with camera control LoRA:


from ltx_client import LTXClient

client = LTXClient(api_key="your-key")

motion_guidance = {
   "camera_type": "dolly",
   "direction": "forward",
   "speed": 0.6,  # 0 = none, 1 = fast
   "duration": 1.0  # full length of video
}

video = client.generate(
   prompt="A serene forest clearing, sunlight filtering through tall pines",
   motion_guidance=motion_guidance,
   loras=["camera-dolly-forward"]
)

The speed parameter controls how aggressive the dolly is. 0.3 feels like a subtle push-in. 0.7+ feels cinematic and intentional.

Real use case: Product demos. A dolly forward on your product sitting on a desk immediately feels more dynamic than a static shot. It draws the viewer's eye and builds visual momentum.

2. Pan and Tilt (Horizontal and Vertical Camera Rotation)

A pan rotates the camera left or right. A tilt rotates it up or down. These are the most commonly used camera movements in video because they feel natural—like you're turning your head to look at something.


# Pan left across a landscape
motion_guidance = {
   "camera_type": "pan",
   "direction": "left",
   "arc_degrees": 45,  # how far to pan
   "speed": 0.5,
   "start_offset": 0.1  # delay before pan starts
}

video = client.generate(
   prompt="A sweeping mountain vista at golden hour",
   motion_guidance=motion_guidance,
   loras=["camera-pan-left"]
)

And for tilt:


# Tilt down from sky to landscape
motion_guidance = {
   "camera_type": "tilt",
   "direction": "down",
   "arc_degrees": 60,
   "speed": 0.4
}

video = client.generate(
   prompt="Dramatic sky transitioning to a desert landscape",
   motion_guidance=motion_guidance,
   loras=["camera-tilt-down"]
)

Pan and tilt are your workhorses. Use them to guide the viewer's attention, reveal new elements of the scene, or create a sense of spatial orientation.

3. Jib (Crane-Like Movement)

A jib combines dolly and tilt—it moves the camera forward while tilting up, or backward while tilting down. It feels cinematic because it's combining multiple axes of movement. Think of a crane on a film set.


# Jib up: move backward and tilt up
motion_guidance = {
   "camera_type": "jib",
   "primary_direction": "up",  # 'up' means backward + tilt up
   "dolly_intensity": 0.5,
   "tilt_intensity": 0.6,
   "duration": 1.0
}

video = client.generate(
   prompt="A character in a room, surrounded by tall architecture",
   motion_guidance=motion_guidance,
   loras=["camera-jib-up"]
)

Jib movements are powerful. A jib up reveals the scope of a scene. A jib down creates intimacy or drama. Use them sparingly—they're memorable, and overusing them dilutes their impact.

4. Rack Focus (Depth of Field Shift)

Rack focus shifts which plane of the scene is in sharp focus. In a real camera, this is done by adjusting the lens aperture. In LTX-2, the LoRA guides the model to generate video that simulates this effect.


# Rack focus from foreground to background
motion_guidance = {
   "camera_type": "rack_focus",
   "from_plane": "foreground",
   "to_plane": "background",
   "speed": 0.4,
   "softness": 0.5  # how gradual the focus shift is
}

video = client.generate(
   prompt="A workspace with a coffee cup in foreground and monitor in background",
   motion_guidance=motion_guidance,
   loras=["camera-rack-focus"]
)

Rack focus is subtle but powerful. It directs attention without moving the camera. It's perfect for interviews, product shots, or any scene where you want to guide the viewer's gaze.

Combining LoRAs: Building Complex Movements

Here's where it gets interesting. You can stack multiple LoRAs to create more complex camera choreography. A pan combined with a dolly creates a more dynamic, less predictable movement.

Example: Pan + Dolly (Tracking Shot)


# A tracking shot: move the camera right and forward simultaneously
motion_guidance = {
   "primary_movement": "pan_and_dolly",
   "pan_direction": "right",
   "pan_intensity": 0.4,
   "dolly_direction": "forward",
   "dolly_intensity": 0.5,
   "synchronization": "parallel"  # both movements happen at same time
}

video = client.generate(
   prompt="A person walking through a modern office, glass walls and natural light",
   motion_guidance=motion_guidance,
   loras=["camera-pan-right", "camera-dolly-forward"]
)

This is a tracking shot—the camera follows movement while also moving through space. It feels energetic and cinematic.

Example: Dolly + Rack Focus (Cinematic Reveal)


# Move into a subject while shifting focus—a classic cinematic move
motion_guidance = {
   "primary_movement": "dolly_with_focus_shift",
   "dolly_direction": "forward",
   "dolly_intensity": 0.6,
   "focus_shift": {
       "from_plane": "background",
       "to_plane": "subject",
       "timing": "simultaneous"
   }
}

video = client.generate(
   prompt="A mysterious figure emerging from fog in a forest",
   motion_guidance=motion_guidance,
   loras=["camera-dolly-forward", "camera-rack-focus"]
)

This combination feels incredibly cinematic. You're moving closer to the subject while the focus sharpens on them. It's intimate and intentional.

Example: Jib + Pan (Spatial Storytelling)


# Reveal the scope of a landscape, then direct attention
motion_guidance = {
   "movement_sequence": [
       {
           "type": "jib",
           "primary_direction": "up",
           "duration": 0.5
       },
       {
           "type": "pan",
           "direction": "left",
           "duration": 0.5,
           "start_offset": 0.5  # start when jib finishes
       }
   ]
}

video = client.generate(
   prompt="An ancient cathedral in a sprawling medieval town at sunset",
   motion_guidance=motion_guidance,
   loras=["camera-jib-up", "camera-pan-left"]
)

Sequential movements feel like intentional storytelling. You're showing the viewer the big picture, then drawing their eye to a specific element.

Practical Workflows: When to Use Each Movement

Product Showcases

Dolly forward + slight pan. Start wide, move closer, pan slightly to show the product from a subtle angle. Feels dynamic without being distracting.


product_showcase = {
   "movements": [
       {"type": "dolly_forward", "intensity": 0.5},
       {"type": "pan_right", "intensity": 0.2, "start_offset": 0.3}
   ]
}

Landscape and Travel Content

Pan + tilt to guide the viewer through scenic beauty. These movements reveal new elements without being aggressive. Rack focus to draw attention to distant details.


landscape_reveal = {
   "movements": [
       {"type": "pan_left", "intensity": 0.4, "arc_degrees": 45},
       {"type": "tilt_up", "intensity": 0.3, "start_offset": 0.5},
       {"type": "rack_focus", "to_plane": "background", "start_offset": 0.7}
   ]
}

Interview and Dialogue Scenes

Subtle rack focus and minimal pan. The subject should remain the focus. Move the camera only to maintain visual interest or to shift emphasis between speakers.


interview_setup = {
   "movements": [
       {"type": "rack_focus", "softness": 0.6},
       {"type": "pan", "intensity": 0.15}  # very subtle
   ]
}

Cinematic Storytelling

Jib movements combined with rack focus. Create moments of visual drama. Reveal context through camera movement.


cinematic_moment = {
   "movements": [
       {"type": "jib_up", "intensity": 0.7},
       {"type": "rack_focus", "to_plane": "subject", "start_offset": 0.5}
   ]
}

Advanced Techniques

Matching Camera Motion to Subject Motion

The most sophisticated camera control happens when you align camera movement with what's happening in the scene. A character walking left? Pan with them. An object falling? Tilt down with it.


# Camera motion that follows subject action
motion_guidance = {
   "subject_action": "walking_left",
   "camera_response": "pan_with_subject",
   "lead_distance": 0.3,  # how far ahead of subject to frame
   "anticipation": 0.1  # start panning slightly before subject moves
}

video = client.generate(
   prompt="A character in a suit walking purposefully across a glass-walled office",
   motion_guidance=motion_guidance,
   loras=["camera-pan-tracking"]
)

This creates the feeling that the camera operator is actively watching and responding. It feels alive.

Parallax and Depth Layering

Use different camera speeds for foreground and background elements. A pan that moves faster across a distant mountain than across nearby trees creates depth.


motion_guidance = {
   "parallax_effect": {
       "foreground_pan_intensity": 0.6,
       "background_pan_intensity": 0.2,
       "pan_direction": "right"
   }
}

video = client.generate(
   prompt="A dense forest with mountains visible in the distance",
   motion_guidance=motion_guidance,
   loras=["camera-pan-right-with-parallax"]
)

Parallax is a depth cue. It tells the viewer immediately: there are multiple layers to this scene. The foreground is close, the background is far.

Easing and Acceleration

Smooth camera movements feel cinematic. Jerky movements feel like a mistake. Use easing functions to control how a movement accelerates and decelerates.


motion_guidance = {
   "camera_movement": "dolly_forward",
   "easing": {
       "type": "ease_in_out_cubic",
       "acceleration_phase": 0.3,  # 30% of movement for acceleration
       "deceleration_phase": 0.3   # 30% of movement for deceleration
   }
}

video = client.generate(
   prompt="A sleek car on a mountain road, morning mist",
   motion_guidance=motion_guidance,
   loras=["camera-dolly-forward"]
)

With easing, your dolly shot feels like a human operator—smoothly accelerating into the movement, then easing out. Without it, the camera feels mechanical.

Troubleshooting and Best Practices

Movement Feels Jerky or Unnatural

Use easing. Add a deceleration phase. Reduce the speed parameter slightly. Jerkiness usually means the motion is too fast for the scene complexity.

Camera Movement Seems to Fight the Subject

Check your pan direction vs. subject movement direction. If a character is walking left and you're panning right, they'll fight each other. Align the movement vectors.

Rack Focus Not Triggering

Ensure your prompt includes elements at different depths. "A blurred coffee cup in foreground and sharp monitor in background" gives the model something to work with. Vague prompts make rack focus harder.

LoRA Conflicts

Stacking too many LoRAs can create conflicting guidance. Start with one movement type, validate it works, then add a second. If something feels wrong, remove the most recent LoRA.

Best Practices

Pushing Further: Experimental Techniques

Whip Pans (Advanced)

A whip pan is a very fast pan that's so quick it creates a momentary blur. It's disorienting but exciting. Use sparingly.


motion_guidance = {
   "camera_type": "pan_whip",
   "direction": "right",
   "speed": 0.95,  # Very fast
   "blur_intensity": 0.7,
   "duration": 0.3  # Short burst
}

Spiral and Orbit Movements

Some LoRAs support spiral or orbital camera movements—the camera rotates around a subject while also moving toward or away from it. Perfect for 360-degree product reveals.


motion_guidance = {
   "camera_type": "orbit",
   "orbit_axis": "vertical",  # rotate around vertical axis
   "orbit_degrees": 180,
   "dolly_direction": "forward",
   "dolly_intensity": 0.4
}

Synchronized Multi-Camera Perspectives

Advanced setup: combine LoRAs to simulate cutting between camera positions. This requires careful timing and clear prompts about spatial relationships.

The Takeaway: Intentional Camera Motion

Camera control LoRAs don't just make your videos prettier. They give you a language for visual storytelling. Instead of hoping the model interprets your words the way you meant them, you're directly specifying how the viewer should see the scene.

A dolly forward says: "Pay attention, we're going deeper."

A pan says: "Look what's over here."

A rack focus says: "This is important right now."

Jib up says: "This is bigger than you thought."

These movements are not just technical tools. They're rhetorical. They make arguments about what matters in the frame.

And that's the real win with LTX-2 camera control LoRAs. You're not just generating video. You're directing it. You're making intentional creative choices, frame by frame, and the model executes them with precision.

The skills translate directly to real camera operation.

The fundamentals—how movement guides attention, how pacing affects emotion, how combining movements creates sophisticated visual language—these principles are the same whether you're operating a RED camera on a film set or writing motion guidance for an AI model.

Start with simple movements. Stack them together. Experiment with timing and intensity. Watch how subtle changes shift the feeling of the video. You'll develop instincts for what works.

The camera is yours now. Move it intentionally.

No items found.