H2VLR: Heterogeneous Hypergraph Vision-Language Reasoning for Few-Shot Anomaly Detection

Jianghong Huang; Luping Ji; Weiwei Duan; Mao Ye

arXiv:2604.14507·cs.CV·April 17, 2026

H2VLR: Heterogeneous Hypergraph Vision-Language Reasoning for Few-Shot Anomaly Detection

Jianghong Huang, Luping Ji, Weiwei Duan, Mao Ye

PDF

TL;DR

H2VLR introduces a hypergraph-based reasoning framework that models high-order visual-semantic relations to improve few-shot anomaly detection in industrial and medical imaging.

Contribution

It presents a novel hypergraph reasoning approach that captures structural dependencies, surpassing pairwise methods in VLM-based FSAD tasks.

Findings

01

Achieves state-of-the-art performance on industrial benchmarks.

02

Effectively models high-order visual-semantic relations.

03

Outperforms existing pairwise matching schemes.

Abstract

As a classic vision task, anomaly detection has been widely applied in industrial inspection and medical imaging. In this task, data scarcity is often a frequently-faced issue. To solve it, the few-shot anomaly detection (FSAD) scheme is attracting increasing attention. In recent years, beyond traditional visual paradigm, Vision-Language Model (VLM) has been extensively explored to boost this field. However, in currently-existing VLM-based FSAD schemes, almost all perform anomaly inference only by pairwise feature matching, ignoring structural dependencies and global consistency. To further redound to FSAD via VLM, we propose a Heterogeneous Hypergraph Vision-Language Reasoning (H2VLR) framework. It reformulates the FSAD as a high-order inference problem of visual-semantic relations, by jointly modeling visual regions and semantic concepts in a unified hypergraph. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.