Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration

Ting Lei; Shaofeng Yin; Qingchao Chen; Yuxin Peng; Yang Liu

arXiv:2508.03207·cs.CV·August 6, 2025

Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration

Ting Lei, Shaofeng Yin, Qingchao Chen, Yuxin Peng, Yang Liu

PDF

TL;DR

This paper introduces INP-CC, an innovative end-to-end method for open-vocabulary human-object interaction detection that uses interaction-aware prompts and concept calibration to improve detection of novel interactions.

Contribution

The paper presents a novel interaction-aware prompt generator and language model-guided concept calibration for better HOI detection beyond training classes.

Findings

01

Outperforms state-of-the-art on SWIG-HOI and HICO-DET datasets.

02

Enhances differentiation of similar HOI concepts.

03

Improves detection of unseen interaction classes.

Abstract

Open Vocabulary Human-Object Interaction (HOI) detection aims to detect interactions between humans and objects while generalizing to novel interaction classes beyond the training set. Current methods often rely on Vision and Language Models (VLMs) but face challenges due to suboptimal image encoders, as image-level pre-training does not align well with the fine-grained region-level interaction detection required for HOI. Additionally, effectively encoding textual descriptions of visual appearances remains difficult, limiting the model's ability to capture detailed HOI relationships. To address these issues, we propose INteraction-aware Prompting with Concept Calibration (INP-CC), an end-to-end open-vocabulary HOI detector that integrates interaction-aware prompts and concept calibration. Specifically, we propose an interaction-aware prompt generator that dynamically generates a compact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.