Align3D-AD: Cross-Modal Feature Alignment and Dual-Prompt Learning for Zero-shot 3D Anomaly Detection
Letian Bai, Xuanming Cao, Juan Du, Chengyu Tao

TL;DR
Align3D-AD introduces a cross-modal feature alignment and dual-prompt learning framework for zero-shot 3D anomaly detection, effectively bridging the domain gap between 3D representations and RGB semantics.
Contribution
It proposes a novel two-stage approach that aligns 3D rendering features with RGB semantics and employs modality-aware prompts for improved zero-shot anomaly detection.
Findings
Outperforms existing zero-shot methods on multiple datasets
Demonstrates strong generalization across different datasets and settings
Effectively captures complementary semantics across modalities
Abstract
Zero-shot 3D anomaly detection aims to identify anomalies without access to training data from target categories. However, existing methods mainly rely on projecting 3D observations into multi-view representations that primarily capture geometric cues rather than realistic visual semantics and process them with vision encoders pretrained on RGB data, leading to a significant domain gap between the encoder and the projected representations. To address this issue, we propose Align3D-AD, a unified two-stage framework that leverages the RGB modality from auxiliary categories as cross-modal guidance for zero-shot 3D anomaly detection. First, we introduce a cross-modal feature alignment paradigm that maps rendering features into the RGB semantic space. Unlike prior works that implicitly rely on pretrained encoders, our method enables direct semantic transfer from RGB observations. A semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
