Multiple View Performers for Shape Completion
David Watkins, Peter Allen, Krzysztof Choromanski, Jacob Varley, and, Nicholas Waytowich

TL;DR
The paper introduces MVP, a novel Transformer-based architecture for 3D shape completion from sequential views, leveraging memory-efficient attention and achieving registration-free reconstruction.
Contribution
MVP is the first multiple view voxel reconstruction method that does not require registration and employs causal Transformers for 3D shape completion.
Findings
MVP outperforms baseline models in shape completion accuracy.
The model generalizes well across different sequences and scenarios.
It efficiently handles long sequences with size-independent memory.
Abstract
We propose the Multiple View Performer (MVP) - a new architecture for 3D shape completion from a series of temporally sequential views. MVP accomplishes this task by using linear-attention Transformers called Performers. Our model allows the current observation of the scene to attend to the previous ones for more accurate infilling. The history of past observations is compressed via the compact associative memory approximating modern continuous Hopfield memory, but crucially of size independent from the history length. We compare our model with several baselines for shape completion over time, demonstrating the generalization gains that MVP provides. To the best of our knowledge, MVP is the first multiple view voxel reconstruction method that does not require registration of multiple depth views and the first causal Transformer based model for 3D shape completion.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Fast Attention Via Positive Orthogonal Random Features · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Performer · Adam · Softmax · Dropout
