Is Attention All That NeRF Needs?

Mukund Varma T; Peihao Wang; Xuxi Chen; Tianlong Chen; Subhashini; Venugopalan; Zhangyang Wang

arXiv:2207.13298·cs.CV·March 3, 2023·6 cites

Is Attention All That NeRF Needs?

Mukund Varma T, Peihao Wang, Xuxi Chen, Tianlong Chen, Subhashini, Venugopalan, Zhangyang Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

The paper introduces GNT, a transformer-based model that generalizes NeRFs for novel view synthesis across scenes, outperforming previous methods and learning physically-grounded rendering through attention mechanisms.

Contribution

GNT is the first transformer-based architecture that generalizes NeRFs across scenes, using attention to learn scene representation and rendering without explicit formulas.

Findings

01

GNT achieves state-of-the-art transfer performance on unseen scenes.

02

GNT can reconstruct NeRFs from a single scene without explicit rendering equations.

03

Attention maps in GNT reveal physically-grounded cues like depth and occlusion.

Abstract

We present Generalizable NeRF Transformer (GNT), a transformer-based architecture that reconstructs Neural Radiance Fields (NeRFs) and learns to renders novel views on the fly from source views. While prior works on NeRFs optimize a scene representation by inverting a handcrafted rendering equation, GNT achieves neural representation and rendering that generalizes across scenes using transformers at two stages. (1) The view transformer leverages multi-view geometry as an inductive bias for attention-based scene representation, and predicts coordinate-aligned features by aggregating information from epipolar lines on the neighboring views. (2) The ray transformer renders novel views using attention to decode the features from the view transformer along the sampled points during ray marching. Our experiments demonstrate that when optimized on a single scene, GNT can successfully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vita-group/gnt
pytorch

Videos

Is Attention All That NeRF Needs?· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis

MethodsAttention Is All You Need · Linear Layer · Softmax · Dense Connections · Dropout · Adam · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Residual Connection