Modulate-and-Map: Crossmodal Feature Mapping with Cross-View Modulation for 3D Anomaly Detection

Alex Costanzino; Pierluigi Zama Ramirez; Giuseppe Lisanti; Luigi Di Stefano

arXiv:2604.02328·cs.CV·April 3, 2026

Modulate-and-Map: Crossmodal Feature Mapping with Cross-View Modulation for 3D Anomaly Detection

Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano

PDF

TL;DR

ModMap is a multiview and multimodal framework for 3D anomaly detection that learns crossmodal feature mapping and view-dependent relationships, achieving state-of-the-art results on the SiM3D benchmark.

Contribution

Introduces ModMap, a novel crossmodal and multiview learning approach with a new training strategy and a depth encoder for industrial 3D anomaly detection.

Findings

01

Achieves state-of-the-art performance on SiM3D benchmark.

02

Surpasses previous methods by wide margins.

03

Effectively models view-dependent relationships through feature-wise modulation.

Abstract

We present ModMap, a natively multiview and multimodal framework for 3D anomaly detection and segmentation. Unlike existing methods that process views independently, our method draws inspiration from the crossmodal feature mapping paradigm to learn to map features across both modalities and views, while explicitly modelling view-dependent relationships through feature-wise modulation. We introduce a cross-view training strategy that leverages all possible view combinations, enabling effective anomaly scoring through multiview ensembling and aggregation. To process high-resolution 3D data, we train and publicly release a foundational depth encoder tailored to industrial datasets. Experiments on SiM3D, a recent benchmark that introduces the first multiview and multimodal setup for 3D anomaly detection and segmentation, demonstrate that ModMap attains state-of-the-art performance by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.