SGANet: Semantic and Geometric Alignment for Multimodal Multi-view Anomaly Detection

Letian Bai; Chengyu Tao; Juan Du

arXiv:2604.05632·cs.CV·April 8, 2026

SGANet: Semantic and Geometric Alignment for Multimodal Multi-view Anomaly Detection

Letian Bai, Chengyu Tao, Juan Du

PDF

TL;DR

SGANet is a novel framework that improves multimodal multi-view anomaly detection by aligning semantic and geometric features across viewpoints and modalities, leading to state-of-the-art results.

Contribution

It introduces a unified approach combining semantic and geometric alignment modules for more accurate anomaly detection across multiple views and modalities.

Findings

01

SGANet outperforms existing methods on SiM3D and Eyecandies datasets.

02

The framework effectively enhances anomaly detection and localization accuracy.

03

Extensive experiments validate its robustness in industrial scenarios.

Abstract

Multi-view anomaly detection aims to identify surface defects on complex objects using observations captured from multiple viewpoints. However, existing unsupervised methods often suffer from feature inconsistency arising from viewpoint variations and modality discrepancies. To address these challenges, we propose a Semantic and Geometric Alignment Network (SGANet), a unified framework for multimodal multi-view anomaly detection that effectively combines semantic and geometric alignment to learn physically coherent feature representations across viewpoints and modalities. SGANet consists of three key components. The Selective Cross-view Feature Refinement Module (SCFRM) selectively aggregates informative patch features from adjacent views to enhance cross-view feature interaction. The Semantic-Structural Patch Alignment (SSPA) enforces semantic alignment across modalities while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.