VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation

Mengtian Li; Yuwei Lu; Feifei Li; Chenqi Gan; Zhifeng Xie; and Xi Wang

arXiv:2604.02467·cs.CV·April 29, 2026

VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation

Mengtian Li, Yuwei Lu, Feifei Li, Chenqi Gan, Zhifeng Xie, and Xi Wang

PDF

TL;DR

VERTIGO is a novel framework that optimizes cinematic camera trajectories by incorporating visual preference signals through real-time rendering and vision-language models, improving framing and aesthetic quality.

Contribution

It introduces a new visual preference optimization method for camera trajectory generation using real-time rendering and semantic similarity scoring.

Findings

01

VERTIGO significantly reduces off-screen characters from 38% to nearly 0%.

02

The framework improves framing quality and perceptual realism.

03

User studies favor VERTIGO over baseline methods.

Abstract

Cinematic camera control relies on a tight feedback loop between director and cinematographer, where camera motion and framing are continuously reviewed and refined. Recent generative camera systems can produce diverse, text-conditioned trajectories, but they lack this "director in the loop" and have no explicit supervision of whether a shot is visually desirable. This results in in-distribution camera motion but poor framing, off-screen characters, and undesirable visual aesthetics. In this paper, we introduce VERTIGO, the first framework for visual preference optimization of camera trajectory generators. Our framework leverages a real-time graphics engine (Unity) to render 2D visual previews from generated camera motion. A cinematically fine-tuned vision-language model then scores these previews using our proposed cyclic semantic similarity mechanism, which aligns renders with text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.