Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model
Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao, Zhang

TL;DR
This paper introduces DiffHOI, a novel HOI detection method leveraging a pre-trained text-to-image diffusion model and a synthetic dataset SynHOI, significantly improving detection accuracy and addressing data scarcity issues.
Contribution
The paper presents a new HOI detection framework using a frozen diffusion model for enhanced representations and a large-scale synthetic dataset to mitigate data imbalance.
Findings
DiffHOI achieves 41.50 mAP, outperforming state-of-the-art methods.
SynHOI reduces long-tail issues and boosts rare class detection by 11.55% mAP.
The approach improves zero-shot and model-agnostic HOI detection performance.
Abstract
This paper investigates the problem of the current HOI detection methods and introduces DiffHOI, a novel HOI detection scheme grounded on a pre-trained text-image diffusion model, which enhances the detector's performance via improved data diversity and HOI representation. We demonstrate that the internal representation space of a frozen text-to-image diffusion model is highly relevant to verb concepts and their corresponding context. Accordingly, we propose an adapter-style tuning method to extract the various semantic associated representation from a frozen diffusion model and CLIP model to enhance the human and object representations from the pre-trained detector, further reducing the ambiguity in interaction prediction. Moreover, to fill in the gaps of HOI datasets, we propose SynHOI, a class-balance, large-scale, and high-diversity synthetic dataset containing over 140K HOI images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
MethodsDiffusion · Contrastive Language-Image Pre-training
