RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection

Jihwan Park; Chanhyeong Yang; Jinyoung Park; Taehoon Song; Hyunwoo J. Kim

arXiv:2604.00507·cs.CV·April 2, 2026

RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection

Jihwan Park, Chanhyeong Yang, Jinyoung Park, Taehoon Song, Hyunwoo J. Kim

PDF

1 Repo

TL;DR

RegFormer introduces a spatially grounded transformer module that enhances weakly-supervised human-object interaction detection by enabling efficient, accurate, and transferable instance-level reasoning from image-level annotations.

Contribution

It proposes a novel relational grounding transformer that learns localized interaction cues, improving efficiency and accuracy in weakly-supervised HOI detection without extra training.

Findings

01

Achieves performance comparable to fully supervised models.

02

Operates with high efficiency due to localized reasoning.

03

Effectively transfers from image-level to instance-level reasoning.

Abstract

Weakly-supervised Human-Object Interaction (HOI) detection is essential for scalable scene understanding, as it learns interactions from only image-level annotations. Due to the lack of localization signals, prior works typically rely on an external object detector to generate candidate pairs and then infer their interactions through pairwise reasoning. However, this framework often struggles to scale due to the substantial computational cost incurred by enumerating numerous instance pairs. In addition, it suffers from false positives arising from non-interactive combinations, which hinder accurate instance-level HOI reasoning. To address these issues, we introduce Relational Grounding Transformer (RegFormer), a versatile interaction recognition module for efficient and accurate HOI reasoning. Under image-level supervision, RegFormer leverages spatially grounded signals as guidance for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlvlab/RegFormer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.