PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

Yipeng Chen; Zhichao Ye; Zhenzhou Fang; Xinyu Chen; Xiaoyu Zhang; Jialing Liu; Nan Wang; Haomin Liu; Guofeng Zhang

arXiv:2511.17185·cs.CV·November 24, 2025

PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

Yipeng Chen, Zhichao Ye, Zhenzhou Fang, Xinyu Chen, Xiaoyu Zhang, Jialing Liu, Nan Wang, Haomin Liu, Guofeng Zhang

PDF

Open Access

TL;DR

PostCam is a novel framework that enables post-capture editing of camera views in dynamic videos, improving control precision and visual quality through a query-shared cross-attention mechanism and a two-stage training process.

Contribution

It introduces a query-shared cross-attention module for integrating camera poses and video frames, enabling more accurate and flexible view synthesis in dynamic scenes.

Findings

01

Outperforms state-of-the-art methods by over 20% in control precision

02

Achieves higher view consistency and visual fidelity

03

Effective in both real-world and synthetic datasets

Abstract

We propose PostCam, a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes. We find that existing video recapture methods suffer from suboptimal camera motion injection strategies; such suboptimal designs not only limit camera control precision but also result in generated videos that fail to preserve fine visual details from the source video. To achieve more accurate and flexible motion manipulation, PostCam introduces a query-shared cross-attention module. It integrates two distinct forms of control signals: the 6-DoF camera poses and the 2D rendered video frames. By fusing them into a unified representation within a shared feature space, our model can extract underlying motion cues, which enhances both control precision and generation quality. Furthermore, we adopt a two-stage training strategy: the model first learns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition