CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation
Yisen Wang, Yao Teng, Limin Wang

TL;DR
CycleHOI introduces a cycle consistency training framework that leverages diffusion models to enhance human-object interaction detection, improving accuracy by integrating detection and generation tasks.
Contribution
The paper proposes a novel cycle consistency loss and feature distillation method that bridges HOI detection with pre-trained diffusion models, boosting detection performance.
Findings
Significant performance improvements on HICO-DET and V-COCO datasets.
Effective use of diffusion models for label correction and data augmentation.
Enhanced detection accuracy across multiple HOI frameworks.
Abstract
Recognition and generation are two fundamental tasks in computer vision, which are often investigated separately in the exiting literature. However, these two tasks are highly correlated in essence as they both require understanding the underline semantics of visual concepts. In this paper, we propose a new learning framework, coined as CycleHOI, to boost the performance of human-object interaction (HOI) detection by bridging the DETR-based detection pipeline and the pre-trained text-to-image diffusion model. Our key design is to introduce a novel cycle consistency loss for the training of HOI detector, which is able to explicitly leverage the knowledge captured in the powerful diffusion model to guide the HOI detector training. Specifically, we build an extra generation task on top of the decoded instance representations from HOI detector to enforce a detection-generation cycle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Context-Aware Activity Recognition Systems
MethodsSparse Evolutionary Training · Diffusion · Cycle Consistency Loss
