Towards Hard-Positive Query Mining for DETR-based Human-Object   Interaction Detection

Xubin Zhong; Changxing Ding; Zijian Li; and Shaoli Huang

arXiv:2207.05293·cs.CV·July 13, 2022·1 cites

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

Xubin Zhong, Changxing Ding, Zijian Li, and Shaoli Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for enhancing DETR-based human-object interaction detection by mining hard-positive queries, which improves robustness and achieves state-of-the-art results on multiple benchmarks.

Contribution

It proposes explicit and implicit hard-positive query mining techniques to improve DETR's robustness in HOI detection, a novel approach not previously explored.

Findings

01

Achieves state-of-the-art performance on HICO-DET, V-COCO, and HOI-A benchmarks.

02

Enhances DETR's robustness to object location changes.

03

Widely applicable to existing DETR-based HOI detectors.

Abstract

Human-Object Interaction (HOI) detection is a core task for high-level image understanding. Recently, Detection Transformer (DETR)-based HOI detectors have become popular due to their superior performance and efficient structure. However, these approaches typically adopt fixed HOI queries for all testing images, which is vulnerable to the location change of objects in one specific image. Accordingly, in this paper, we propose to enhance DETR's robustness by mining hard-positive queries, which are forced to make correct predictions using partial visual cues. First, we explicitly compose hard-positive queries according to the ground-truth (GT) position of labeled human-object pairs for each training image. Specifically, we shift the GT bounding boxes of each labeled human-object pair so that the shifted boxes cover only a certain portion of the GT ones. We encode the coordinates of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

muchhair/hqm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Absolute Position Encodings · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Adam