2D_3D Feature Fusion via Cross-Modal Latent Synthesis and Attention Guided Restoration for Industrial Anomaly Detection

Usman Ali; Ali Zia; Abdul Rehman; Umer Ramzan; Zohaib Hassan; Talha Sattar; Jing Wang; Wei Xiang

arXiv:2510.21793·cs.CV·October 28, 2025

2D_3D Feature Fusion via Cross-Modal Latent Synthesis and Attention Guided Restoration for Industrial Anomaly Detection

Usman Ali, Ali Zia, Abdul Rehman, Umer Ramzan, Zohaib Hassan, Talha Sattar, Jing Wang, Wei Xiang

PDF

TL;DR

This paper introduces MAFR, an unsupervised multi-modal fusion framework combining 2D and 3D data for industrial anomaly detection, achieving state-of-the-art results and robustness in few-shot scenarios.

Contribution

The paper presents a novel fusion architecture that synthesizes a unified latent space from RGB images and point clouds, improving anomaly detection accuracy.

Findings

01

Achieves state-of-the-art I-AUROC scores of 0.972 and 0.901 on benchmarks.

02

Demonstrates strong few-shot learning performance.

03

Ablation studies confirm the importance of fusion architecture and loss functions.

Abstract

Industrial anomaly detection (IAD) increasingly benefits from integrating 2D and 3D data, but robust cross-modal fusion remains challenging. We propose a novel unsupervised framework, Multi-Modal Attention-Driven Fusion Restoration (MAFR), which synthesises a unified latent space from RGB images and point clouds using a shared fusion encoder, followed by attention-guided, modality-specific decoders. Anomalies are localised by measuring reconstruction errors between input features and their restored counterparts. Evaluations on the MVTec 3D-AD and Eyecandies benchmarks demonstrate that MAFR achieves state-of-the-art results, with a mean I-AUROC of 0.972 and 0.901, respectively. The framework also exhibits strong performance in few-shot learning settings, and ablation studies confirm the critical roles of the fusion architecture and composite loss. MAFR offers a principled approach for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.