ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

Omar El Khalifi; Thomas Rossi; Oscar Fossey; Thibault Fouque; Ulysse Mizrahi; Philip Torr; Ivan Laptev; Fabio Pizzati; Baptiste Bellot-Gurlet

arXiv:2605.06667·cs.CV·May 8, 2026

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

Omar El Khalifi, Thomas Rossi, Oscar Fossey, Thibault Fouque, Ulysse Mizrahi, Philip Torr, Ivan Laptev, Fabio Pizzati, Baptiste Bellot-Gurlet

PDF

1 Repo

TL;DR

ActCam is a zero-shot video generation method that jointly controls character motion and camera trajectory, improving scene consistency and motion fidelity without training.

Contribution

It introduces a novel staged conditioning approach for joint camera and motion control in video generation using pretrained diffusion models.

Findings

01

Outperforms pose-only control in camera adherence and motion fidelity.

02

Preferred in human evaluations, especially with large viewpoint changes.

03

Enables joint camera and motion control without training.

Abstract

For artistic applications, video generation requires fine-grained control over both performance and cinematography, i.e., the actor's motion and the camera trajectory. We present ActCam, a zero-shot method for video generation that jointly transfers character motion from a driving video into a new scene and enables per-frame control of intrinsic and extrinsic camera parameters. ActCam builds on any pretrained image-to-video diffusion model that accepts conditioning in terms of scene depth and character pose. Given a source video with a moving character and a target camera motion, ActCam generates pose and depth conditions that remain geometrically consistent across frames. We then run a single sampling process with a two-phase conditioning schedule: early denoising steps condition on both pose and sparse depth to enforce scene structure, after which depth is dropped and pose-only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://elkhomar.github.io/actcam
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.