VERAM: View-Enhanced Recurrent Attention Model for 3D Shape Classification
Songle Chen, Lintao Zheng, Yan Zhang, Zhixin Sun, Kai Xu

TL;DR
VERAM is an active view selection model for 3D shape classification that improves accuracy by actively choosing informative views, overcoming training challenges in attention-based networks, and achieving state-of-the-art results on ModelNet datasets.
Contribution
The paper introduces VERAM, a recurrent attention model with view-enhancement strategies for improved 3D shape classification accuracy.
Findings
VERAM achieves over 95% accuracy on ModelNet10 and ModelNet40 datasets.
The model outperforms existing multi-view methods with the same number of views.
Enhanced training strategies improve the balance between view estimation and classification subnetworks.
Abstract
Multi-view deep neural network is perhaps the most successful approach in 3D shape classification. However, the fusion of multi-view features based on max or average pooling lacks a view selection mechanism, limiting its application in, e.g., multi-view active object recognition by a robot. This paper presents VERAM, a recurrent attention model capable of actively selecting a sequence of views for highly accurate 3D shape classification. VERAM addresses an important issue commonly found in existing attention-based models, i.e., the unbalanced training of the subnetworks corresponding to next view estimation and shape classification. The classification subnetwork is easily overfitted while the view estimation one is usually poorly trained, leading to a suboptimal classification performance. This is surmounted by three essential view-enhancement strategies: 1) enhancing the information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Medical Image Segmentation Techniques · Human Pose and Action Recognition
Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/
