Is Attention All That NeRF Needs?
Mukund Varma T, Peihao Wang, Xuxi Chen, Tianlong Chen, Subhashini, Venugopalan, Zhangyang Wang

TL;DR
The paper introduces GNT, a transformer-based model that generalizes NeRFs for novel view synthesis across scenes, outperforming previous methods and learning physically-grounded rendering through attention mechanisms.
Contribution
GNT is the first transformer-based architecture that generalizes NeRFs across scenes, using attention to learn scene representation and rendering without explicit formulas.
Findings
GNT achieves state-of-the-art transfer performance on unseen scenes.
GNT can reconstruct NeRFs from a single scene without explicit rendering equations.
Attention maps in GNT reveal physically-grounded cues like depth and occlusion.
Abstract
We present Generalizable NeRF Transformer (GNT), a transformer-based architecture that reconstructs Neural Radiance Fields (NeRFs) and learns to renders novel views on the fly from source views. While prior works on NeRFs optimize a scene representation by inverting a handcrafted rendering equation, GNT achieves neural representation and rendering that generalizes across scenes using transformers at two stages. (1) The view transformer leverages multi-view geometry as an inductive bias for attention-based scene representation, and predicts coordinate-aligned features by aggregating information from epipolar lines on the neighboring views. (2) The ray transformer renders novel views using attention to decode the features from the view transformer along the sampled points during ray marching. Our experiments demonstrate that when optimized on a single scene, GNT can successfully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Dropout · Adam · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Residual Connection
