F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang, Siyuan Huang

TL;DR
This paper introduces a new dataset and a unified multimodal model for fine-grained semantic understanding of 3D human-object interactions, enabling detailed state-level analysis and diverse task handling.
Contribution
It presents Semantic-HOI, a large dataset with detailed HOI states, and F-HOI, a versatile model that aligns HOI with multimodal instructions across 2D, 3D, and language.
Findings
F-HOI effectively aligns HOI states with semantic descriptions.
The dataset enables detailed understanding of HOI transitions.
F-HOI performs well on various understanding and generation tasks.
Abstract
Existing 3D human object interaction (HOI) datasets and models simply align global descriptions with the long HOI sequence, while lacking a detailed understanding of intermediate states and the transitions between states. In this paper, we argue that fine-grained semantic alignment, which utilizes state-level descriptions, offers a promising paradigm for learning semantically rich HOI representations. To achieve this, we introduce Semantic-HOI, a new dataset comprising over 20K paired HOI states with fine-grained descriptions for each HOI state and the body movements that happen between two consecutive states. Leveraging the proposed dataset, we design three state-level HOI tasks to accomplish fine-grained semantic alignment within the HOI sequence. Additionally, we propose a unified model called F-HOI, designed to leverage multimodal instructions and empower the Multi-modal Large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · 3D Shape Modeling and Analysis
MethodsALIGN
