Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
Ting Lei, Shaofeng Yin, Yuxin Peng, and Yang Liu

TL;DR
This paper introduces a novel zero-shot HOI detection framework using Conditional Multi-Modal Prompts (CMMP) that improves generalization to unseen interaction categories by decoupling vision and language prompts and integrating prior knowledge.
Contribution
The paper proposes a new CMMP framework that enhances zero-shot HOI detection by learning decoupled prompts and incorporating prior spatial and instance knowledge for better generalization.
Findings
Outperforms previous state-of-the-art on unseen HOI classes
Effectively generalizes to various zero-shot settings
Utilizes prior knowledge to improve interaction classification
Abstract
Zero-shot Human-Object Interaction (HOI) detection has emerged as a frontier topic due to its capability to detect HOIs beyond a predefined set of categories. This task entails not only identifying the interactiveness of human-object pairs and localizing them but also recognizing both seen and unseen interaction categories. In this paper, we introduce a novel framework for zero-shot HOI detection using Conditional Multi-Modal Prompts, namely CMMP. This approach enhances the generalization of large foundation models, such as CLIP, when fine-tuned for HOI detection. Unlike traditional prompt-learning methods, we propose learning decoupled vision and language prompts for interactiveness-aware visual feature extraction and generalizable interaction classification, respectively. Specifically, we integrate prior knowledge of different granularity into conditional vision prompts, including an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · Radiation Detection and Scintillator Technologies · Anomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
