PlaneFormers: From Sparse View Planes to 3D Reconstruction
Samir Agarwala, Linyi Jin, Chris Rockwell, David F. Fouhey

TL;DR
PlaneFormers introduces a transformer-based method for reconstructing 3D planar surfaces from limited-overlap images, effectively integrating 3D reasoning, correspondence, and camera pose estimation in a unified framework.
Contribution
It presents the PlaneFormer, a novel transformer-based approach that simplifies and improves 3D scene reconstruction from sparse view images.
Findings
Outperforms prior optimization-based methods
Effective in scenes with limited image overlap
Highlights importance of 3D-specific design choices
Abstract
We present an approach for the planar surface reconstruction of a scene from images with limited overlap. This reconstruction task is challenging since it requires jointly reasoning about single image 3D reconstruction, correspondence between images, and the relative camera pose between images. Past work has proposed optimization-based approaches. We introduce a simpler approach, the PlaneFormer, that uses a transformer applied to 3D-aware plane tokens to perform 3D reasoning. Our experiments show that our approach is substantially more effective than prior work, and that several 3D-specific design decisions are crucial for its success.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
