Align3D-AD: Cross-Modal Feature Alignment and Dual-Prompt Learning for Zero-shot 3D Anomaly Detection

Letian Bai; Xuanming Cao; Juan Du; Chengyu Tao

arXiv:2605.05850·cs.CV·May 8, 2026

Align3D-AD: Cross-Modal Feature Alignment and Dual-Prompt Learning for Zero-shot 3D Anomaly Detection

Letian Bai, Xuanming Cao, Juan Du, Chengyu Tao

PDF

TL;DR

Align3D-AD introduces a cross-modal feature alignment and dual-prompt learning framework for zero-shot 3D anomaly detection, effectively bridging the domain gap between 3D representations and RGB semantics.

Contribution

It proposes a novel two-stage approach that aligns 3D rendering features with RGB semantics and employs modality-aware prompts for improved zero-shot anomaly detection.

Findings

01

Outperforms existing zero-shot methods on multiple datasets

02

Demonstrates strong generalization across different datasets and settings

03

Effectively captures complementary semantics across modalities

Abstract

Zero-shot 3D anomaly detection aims to identify anomalies without access to training data from target categories. However, existing methods mainly rely on projecting 3D observations into multi-view representations that primarily capture geometric cues rather than realistic visual semantics and process them with vision encoders pretrained on RGB data, leading to a significant domain gap between the encoder and the projected representations. To address this issue, we propose Align3D-AD, a unified two-stage framework that leverages the RGB modality from auxiliary categories as cross-modal guidance for zero-shot 3D anomaly detection. First, we introduce a cross-modal feature alignment paradigm that maps rendering features into the RGB semantic space. Unlike prior works that implicitly rely on pretrained encoders, our method enables direct semantic transfer from RGB observations. A semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.