Modulate-and-Map: Crossmodal Feature Mapping with Cross-View Modulation for 3D Anomaly Detection
Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano

TL;DR
ModMap is a multiview and multimodal framework for 3D anomaly detection that learns crossmodal feature mapping and view-dependent relationships, achieving state-of-the-art results on the SiM3D benchmark.
Contribution
Introduces ModMap, a novel crossmodal and multiview learning approach with a new training strategy and a depth encoder for industrial 3D anomaly detection.
Findings
Achieves state-of-the-art performance on SiM3D benchmark.
Surpasses previous methods by wide margins.
Effectively models view-dependent relationships through feature-wise modulation.
Abstract
We present ModMap, a natively multiview and multimodal framework for 3D anomaly detection and segmentation. Unlike existing methods that process views independently, our method draws inspiration from the crossmodal feature mapping paradigm to learn to map features across both modalities and views, while explicitly modelling view-dependent relationships through feature-wise modulation. We introduce a cross-view training strategy that leverages all possible view combinations, enabling effective anomaly scoring through multiview ensembling and aggregation. To process high-resolution 3D data, we train and publicly release a foundational depth encoder tailored to industrial datasets. Experiments on SiM3D, a recent benchmark that introduces the first multiview and multimodal setup for 3D anomaly detection and segmentation, demonstrate that ModMap attains state-of-the-art performance by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
