Pose-Free Generalizable Rendering Transformer
Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Hanwen Jiang, Dejia, Xu, Zehao Zhu, Dilin Wang, Zhangyang Wang

TL;DR
PF-GRT introduces a pose-free rendering transformer that learns to synthesize novel views without pre-computed camera poses, achieving high-quality, generalizable, and robust view synthesis across diverse datasets.
Contribution
The paper presents PF-GRT, a novel framework that eliminates the need for camera pose estimation in view synthesis by learning feature matching directly from data.
Findings
Achieves superior photo-realistic image quality in zero-shot rendering tasks.
Demonstrates robustness to noisy camera pose inputs.
Generalizes well to unseen scenes without pose information.
Abstract
In the field of novel-view synthesis, the necessity of knowing camera poses (e.g., via Structure from Motion) before rendering has been a common practice. However, the consistent acquisition of accurate camera poses remains elusive, and errors in pose extraction can adversely impact the view synthesis process. To address this challenge, we introduce PF-GRT, a new Pose-Free framework for Generalizable Rendering Transformer, eliminating the need for pre-computed camera poses and instead leveraging feature-matching learned directly from data. PF-GRT is parameterized using a local relative coordinate system, where one of the source images is set as the origin. An OmniView Transformer is designed for fusing multi-view cues under the pose-free setting, where unposed-view fusion and origin-centric aggregation are performed. The 3D point feature along target ray is sampled by projecting onto…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Computer Graphics and Visualization Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Byte Pair Encoding · Softmax · RoIAlign · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer
