Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss
Yuxiao Wang, Yu Lei, Zhenao Wei, Weiying Xue, Xinyu Jiang, Nan Zhuang, Qi Liu

TL;DR
This paper introduces P3HOT, a novel framework for Human-Object Contact detection that combines prompt guidance, proximal perception, and a regional joint loss to improve accuracy and address existing limitations in the field.
Contribution
The paper proposes P3HOT, integrating prompt guidance, depth-based proximal perception, and a new regional joint loss for enhanced HOT detection performance.
Findings
Achieves state-of-the-art results on HOT benchmarks.
Improves metrics such as SC-Acc., mIoU, wIoU, and AD-Acc.
Demonstrates effectiveness of depth-based perception and regional loss.
Abstract
The task of Human-Object conTact (HOT) detection involves identifying the specific areas of the human body that are touching objects. Nevertheless, current models are restricted to just one type of image, often leading to too much segmentation in areas with little interaction, and struggling to maintain category consistency within specific regions. To tackle this issue, a HOT framework, termed \textbf{P3HOT}, is proposed, which blends \textbf{P}rompt guidance and human \textbf{P}roximal \textbf{P}erception. To begin with, we utilize a semantic-driven prompt mechanism to direct the network's attention towards the relevant regions based on the correlation between image and text. Then a human proximal perception mechanism is employed to dynamically perceive key depth range around the human, using learnable parameters to effectively eliminate regions where interactions are not expected.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making · Visual Attention and Saliency Detection · Visual perception and processing mechanisms
