Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection
Jihwan Park, SeungJun Lee, Hwan Heo, Hyeong Kyu Choi, Hyunwoo J.Kim

TL;DR
This paper introduces cross-path consistency learning (CPC), a novel training strategy for transformer-based human-object interaction detection that enforces prediction consistency across augmented decoding paths, leading to improved performance.
Contribution
The paper proposes CPC, a new end-to-end learning method that enhances transformer HOI detection by leveraging augmented decoding paths for better consistency and generalization.
Findings
Significant performance improvements on V-COCO and HICO-DET datasets.
CPC improves model generalization without increasing capacity.
Effective enforcement of prediction consistency across inference paths.
Abstract
Human-Object Interaction detection is a holistic visual recognition task that entails object detection as well as interaction classification. Previous works of HOI detection has been addressed by the various compositions of subset predictions, e.g., Image -> HO -> I, Image -> HI -> O. Recently, transformer based architecture for HOI has emerged, which directly predicts the HOI triplets in an end-to-end fashion (Image -> HOI). Motivated by various inference paths for HOI detection, we propose cross-path consistency learning (CPC), which is a novel end-to-end learning strategy to improve HOI detection for transformers by leveraging augmented decoding paths. CPC learning enforces all the possible predictions from permuted inference sequences to be consistent. This simple scheme makes the model learn consistent representations, thereby improving generalization without increasing model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
