DQEN: Dual Query Enhancement Network for DETR-based HOI Detection

Zhehao Li; Chong Wang; Yi Chen; Yinghao Lu; Jiangbo Qian; Jiong Wang; and Jiafei Wu

arXiv:2508.18896·cs.CV·August 27, 2025

DQEN: Dual Query Enhancement Network for DETR-based HOI Detection

Zhehao Li, Chong Wang, Yi Chen, Yinghao Lu, Jiangbo Qian, Jiong Wang, and Jiafei Wu

PDF

TL;DR

DQEN introduces a dual query enhancement approach for DETR-based HOI detection, improving interaction understanding by enhancing object and interaction queries with semantic fusion and auxiliary prediction, leading to better performance on standard datasets.

Contribution

The paper proposes a novel Dual Query Enhancement Network (DQEN) that enhances object and interaction queries using object-aware features and semantic fusion, addressing limitations of random query initialization in DETR-based HOI detection.

Findings

01

Achieves competitive results on HICO-Det and V-COCO datasets.

02

Enhances interaction query initialization with CLIP-based semantic features.

03

Improves detection accuracy by auxiliary interaction feature prediction.

Abstract

Human-Object Interaction (HOI) detection focuses on localizing human-object pairs and recognizing their interactions. Recently, the DETR-based framework has been widely adopted in HOI detection. In DETR-based HOI models, queries with clear meaning are crucial for accurately detecting HOIs. However, prior works have typically relied on randomly initialized queries, leading to vague representations that limit the model's effectiveness. Meanwhile, humans in the HOI categories are fixed, while objects and their interactions are variable. Therefore, we propose a Dual Query Enhancement Network (DQEN) to enhance object and interaction queries. Specifically, object queries are enhanced with object-aware encoder features, enabling the model to focus more effectively on humans interacting with objects in an object-aware way. On the other hand, we design a novel Interaction Semantic Fusion module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.