NOVUM: Neural Object Volumes for Robust Object Classification
Artur Jesslen, Guofeng Zhang, Angtian Wang, Wufei Ma, Alan Yuille,, Adam Kortylewski

TL;DR
NOVUM introduces a 3D compositional neural object volume model that significantly improves robustness and interpretability in object classification, especially under out-of-distribution conditions, while maintaining real-time performance.
Contribution
The paper presents NOVUM, a novel architecture integrating 3D Gaussian-based object volumes into deep networks for enhanced robustness and interpretability in image classification.
Findings
Superior robustness to out-of-distribution shifts
Enhanced interpretability of object representations
Maintains real-time inference with competitive accuracy
Abstract
Discriminative models for object classification typically learn image-based representations that do not capture the compositional and 3D nature of objects. In this work, we show that explicitly integrating 3D compositional object representations into deep networks for image classification leads to a largely enhanced generalization in out-of-distribution scenarios. In particular, we introduce a novel architecture, referred to as NOVUM, that consists of a feature extractor and a neural object volume for every target object class. Each neural object volume is a composition of 3D Gaussians that emit feature vectors. This compositional object representation allows for a highly robust and fast estimation of the object class by independently matching the features of the 3D Gaussians of each category to features extracted from an input image. Additionally, the object pose can be estimated via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Human Pose and Action Recognition
