Improving Human-Object Interaction Detection via Virtual Image Learning
Shuman Fang, Shuai Liu, Jie Li, Guannan Jiang, Xianming Lin, Rongrong, Ji

TL;DR
This paper introduces Virtual Image Learning (VIL) with a novel dataset creation method and a teacher-student training framework to improve Human-Object Interaction detection, especially addressing data imbalance issues.
Contribution
The paper proposes a new virtual image generation approach and a training framework that enhances HOI detection performance and can be integrated with existing methods.
Findings
Significant performance improvements on benchmarks
Achieved new state-of-the-art results
Effective handling of data imbalance in HOI detection
Abstract
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects, which plays a curtail role in high-level semantic understanding tasks. However, most works pursue designing better architectures to learn overall features more efficiently, while ignoring the long-tail nature of interaction-object pair categories. In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL). Firstly, a novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images. In this stage, virtual images are generated based on prompts with specific characterizations and selected by multi-filtering processes. Secondly, we use both virtual and real images to train the model with the teacher-student framework. Considering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
