Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection

Francesco Tonini; Lorenzo Vaquero; Alessandro Conti; Cigdem Beyan; Elisa Ricci

arXiv:2507.17456·cs.CV·July 24, 2025

Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection

Francesco Tonini, Lorenzo Vaquero, Alessandro Conti, Cigdem Beyan, Elisa Ricci

PDF

TL;DR

This paper introduces DYSCO, a training-free framework that leverages multimodal semantic representations and a novel attention mechanism to improve human-object interaction detection, especially for rare interactions.

Contribution

It proposes a new training-free HOI detection method that enhances semantic alignment using multimodal interaction signatures and a multi-head attention mechanism.

Findings

01

DYSCO outperforms existing training-free models in HOI detection.

02

It achieves competitive results with training-based approaches.

03

Particularly effective in recognizing rare interactions.

Abstract

Human-Object Interaction (HOI) detection aims to identify humans and objects within images and interpret their interactions. Existing HOI methods rely heavily on large datasets with manual annotations to learn interactions from visual cues. These annotations are labor-intensive to create, prone to inconsistency, and limit scalability to new domains and rare interactions. We argue that recent advances in Vision-Language Models (VLMs) offer untapped potential, particularly in enhancing interaction representation. While prior work has injected such potential and even proposed training-free methods, there remain key gaps. Consequently, we propose a novel training-free HOI detection framework for Dynamic Scoring with enhanced semantics (DYSCO) that effectively utilizes textual and visual interaction representations within a multimodal registry, enabling robust and nuanced interaction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.