PEAR: Phrase-Based Hand-Object Interaction Anticipation
Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang

TL;DR
PEAR is a novel model that jointly predicts hand-object interaction intentions and manipulations, addressing previous limitations by integrating intention and manipulation constraints, and is validated on a new dataset with superior results.
Contribution
The paper introduces PEAR, a model that jointly anticipates interaction intention and manipulation, and presents the new EGO-HOIP dataset for comprehensive evaluation.
Findings
PEAR outperforms existing methods in interaction anticipation tasks.
Cross-alignment reduces intention uncertainty effectively.
Bidirectional constraints ensure consistency between intention and manipulation.
Abstract
First-person hand-object interaction anticipation aims to predict the interaction process over a forthcoming period based on current scenes and prompts. This capability is crucial for embodied intelligence and human-robot collaboration. The complete interaction process involves both pre-contact interaction intention (i.e., hand motion trends and interaction hotspots) and post-contact interaction manipulation (i.e., manipulation trajectories and hand poses with contact). Existing research typically anticipates only interaction intention while neglecting manipulation, resulting in incomplete predictions and an increased likelihood of intention errors due to the lack of manipulation constraints. To address this, we propose a novel model, PEAR (Phrase-Based Hand-Object Interaction Anticipation), which jointly anticipates interaction intention and manipulation. To handle uncertainties in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · AI in Service Interactions · Hand Gesture Recognition Systems
