H2VLR: Heterogeneous Hypergraph Vision-Language Reasoning for Few-Shot Anomaly Detection
Jianghong Huang, Luping Ji, Weiwei Duan, Mao Ye

TL;DR
H2VLR introduces a hypergraph-based reasoning framework that models high-order visual-semantic relations to improve few-shot anomaly detection in industrial and medical imaging.
Contribution
It presents a novel hypergraph reasoning approach that captures structural dependencies, surpassing pairwise methods in VLM-based FSAD tasks.
Findings
Achieves state-of-the-art performance on industrial benchmarks.
Effectively models high-order visual-semantic relations.
Outperforms existing pairwise matching schemes.
Abstract
As a classic vision task, anomaly detection has been widely applied in industrial inspection and medical imaging. In this task, data scarcity is often a frequently-faced issue. To solve it, the few-shot anomaly detection (FSAD) scheme is attracting increasing attention. In recent years, beyond traditional visual paradigm, Vision-Language Model (VLM) has been extensively explored to boost this field. However, in currently-existing VLM-based FSAD schemes, almost all perform anomaly inference only by pairwise feature matching, ignoring structural dependencies and global consistency. To further redound to FSAD via VLM, we propose a Heterogeneous Hypergraph Vision-Language Reasoning (H2VLR) framework. It reformulates the FSAD as a high-order inference problem of visual-semantic relations, by jointly modeling visual regions and semantic concepts in a unified hypergraph. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
