Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors

Rong Wang; Ruyi Zha; Ziang Cheng; Jiayu Yang; Pulak Purkait; Hongdong Li

arXiv:2604.12309·cs.CV·April 15, 2026

Towards Realistic and Consistent Orbital Video Generation via 3D Foundation Priors

Rong Wang, Ruyi Zha, Ziang Cheng, Jiayu Yang, Pulak Purkait, Hongdong Li

PDF

TL;DR

This paper introduces a new method for generating realistic and consistent orbital videos from a single image by leveraging 3D shape priors from a foundational generative model, improving long-range view synthesis.

Contribution

It proposes integrating 3D shape priors via a multi-scale 3D adapter into video generation, enhancing shape realism and view consistency over prior pixel-wise attention methods.

Findings

01

Outperforms state-of-the-art methods in visual quality and shape realism.

02

Achieves superior multi-view consistency and generalization to complex trajectories.

03

Effectively models complete object shapes without explicit mesh extraction.

Abstract

We present a novel method for generating geometrically realistic and consistent orbital videos from a single image of an object. Existing video generation works mostly rely on pixel-wise attention to enforce view consistency across frames. However, such mechanism does not impose sufficient constraints for long-range extrapolation, e.g. rear-view synthesis, in which pixel correspondences to the input image are limited. Consequently, these works often fail to produce results with a plausible and coherent structure. To tackle this issue, we propose to leverage rich shape priors from a 3D foundational generative model as an auxiliary constraint, motivated by its capability of modeling realistic object shape distributions learned from large 3D asset corpora. Specifically, we prompt the video generation with two scales of latent features encoded by the 3D foundation model: (i) a denoised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.