Pixel-to-4D: Camera-Controlled Image-to-Video Generation with Dynamic 3D Gaussians

Melonie de Almeida; Daniela Ivanova; Tong Shi; John H. Williamson; Paul Henderson

arXiv:2601.00678·cs.CV·May 18, 2026

Pixel-to-4D: Camera-Controlled Image-to-Video Generation with Dynamic 3D Gaussians

Melonie de Almeida, Daniela Ivanova, Tong Shi, John H. Williamson, Paul Henderson

PDF

1 Repo

TL;DR

This paper introduces a novel 3D Gaussian scene representation for fast, camera-controlled video generation from a single image, achieving state-of-the-art quality and efficiency.

Contribution

It proposes a new framework that constructs a 3D Gaussian scene and samples object motion in one pass, enabling fast, controllable video synthesis without iterative denoising.

Findings

01

Achieves state-of-the-art video quality on multiple datasets.

02

Enables fast inference without iterative denoising.

03

Provides precise camera control and coherent object motion.

Abstract

Humans excel at forecasting the future dynamics of a scene given just a single image. Video generation models that can mimic this ability are an essential component for intelligent systems. Recent approaches have improved temporal coherence and 3D consistency in single-image-conditioned video generation. However, these methods often lack robust user controllability, such as modifying the camera path, limiting their applicability in real-world applications. Most existing camera-controlled image-to-video models struggle with accurately modeling camera motion, maintaining temporal consistency, and preserving geometric integrity. Leveraging explicit intermediate 3D representations offers a promising solution by enabling coherent video generation aligned with a given camera trajectory. Although these methods often use 3D point clouds to render scenes and introduce object motion in a later…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://melonienimasha.github.io/Pixel-to-4D-Website
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.