Pose-Free Generalizable Rendering Transformer

Zhiwen Fan; Panwang Pan; Peihao Wang; Yifan Jiang; Hanwen Jiang; Dejia; Xu; Zehao Zhu; Dilin Wang; Zhangyang Wang

arXiv:2310.03704·cs.CV·December 29, 2023

Pose-Free Generalizable Rendering Transformer

Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Hanwen Jiang, Dejia, Xu, Zehao Zhu, Dilin Wang, Zhangyang Wang

PDF

Open Access

TL;DR

PF-GRT introduces a pose-free rendering transformer that learns to synthesize novel views without pre-computed camera poses, achieving high-quality, generalizable, and robust view synthesis across diverse datasets.

Contribution

The paper presents PF-GRT, a novel framework that eliminates the need for camera pose estimation in view synthesis by learning feature matching directly from data.

Findings

01

Achieves superior photo-realistic image quality in zero-shot rendering tasks.

02

Demonstrates robustness to noisy camera pose inputs.

03

Generalizes well to unseen scenes without pose information.

Abstract

In the field of novel-view synthesis, the necessity of knowing camera poses (e.g., via Structure from Motion) before rendering has been a common practice. However, the consistent acquisition of accurate camera poses remains elusive, and errors in pose extraction can adversely impact the view synthesis process. To address this challenge, we introduce PF-GRT, a new Pose-Free framework for Generalizable Rendering Transformer, eliminating the need for pre-computed camera poses and instead leveraging feature-matching learned directly from data. PF-GRT is parameterized using a local relative coordinate system, where one of the source images is set as the origin. An OmniView Transformer is designed for fusing multi-view cues under the pose-free setting, where unposed-view fusion and origin-centric aggregation are performed. The 3D point feature along target ray is sampled by projecting onto…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Computer Graphics and Visualization Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · RoIAlign · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer