Streamlined Open-Vocabulary Human-Object Interaction Detection

Chang Sun; Dongliang Liao; Changxing Ding

arXiv:2603.27500·cs.CV·March 31, 2026

Streamlined Open-Vocabulary Human-Object Interaction Detection

Chang Sun, Dongliang Liao, Changxing Ding

PDF

1 Repo 1 Models

TL;DR

SL-HOI is a streamlined open-vocabulary human-object interaction detection framework leveraging DINOv3, achieving state-of-the-art results by effectively bridging representation gaps without extensive model training.

Contribution

The paper introduces SL-HOI, a novel framework that uses a frozen DINOv3 model with minimal additional parameters for efficient open-vocabulary HOI detection.

Findings

01

Achieves state-of-the-art performance on SWiG-HOI and HICO-DET benchmarks.

02

Effectively bridges representation gaps between localization and classification components.

03

Uses a simple yet effective architecture with frozen DINOv3 parameters.

Abstract

Open-vocabulary human-object interaction (HOI) detection aims to localize and recognize all human-object interactions in an image, including those unseen during training. Existing approaches usually rely on the collaboration between a conventional HOI detector and a Vision-Language Model (VLM) to recognize unseen HOI categories. However, feature fusion in this paradigm is challenging due to significant gaps in cross-model representations. To address this issue, we introduce SL-HOI, a StreamLined open-vocabulary HOI detection framework based solely on the powerful DINOv3 model. Our design leverages the complementary strengths of DINOv3's components: its backbone for fine-grained localization and its text-aligned vision head for open-vocabulary interaction classification. Moreover, to facilitate smooth cross-attention between the interaction queries and the vision head's output, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MPI-Lab/SL-HOI
github

Models

🤗
Thatmakes11/SL-HOI-weights
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.