F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Jie Yang; Xuesong Niu; Nan Jiang; Ruimao Zhang; Siyuan Huang

arXiv:2407.12435·cs.CV·July 18, 2024

F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang, Siyuan Huang

PDF

Open Access

TL;DR

This paper introduces a new dataset and a unified multimodal model for fine-grained semantic understanding of 3D human-object interactions, enabling detailed state-level analysis and diverse task handling.

Contribution

It presents Semantic-HOI, a large dataset with detailed HOI states, and F-HOI, a versatile model that aligns HOI with multimodal instructions across 2D, 3D, and language.

Findings

01

F-HOI effectively aligns HOI states with semantic descriptions.

02

The dataset enables detailed understanding of HOI transitions.

03

F-HOI performs well on various understanding and generation tasks.

Abstract

Existing 3D human object interaction (HOI) datasets and models simply align global descriptions with the long HOI sequence, while lacking a detailed understanding of intermediate states and the transitions between states. In this paper, we argue that fine-grained semantic alignment, which utilizes state-level descriptions, offers a promising paradigm for learning semantically rich HOI representations. To achieve this, we introduce Semantic-HOI, a new dataset comprising over 20K paired HOI states with fine-grained descriptions for each HOI state and the body movements that happen between two consecutive states. Leveraging the proposed dataset, we design three state-level HOI tasks to accomplish fine-grained semantic alignment within the HOI sequence. Additionally, we propose a unified model called F-HOI, designed to leverage multimodal instructions and empower the Multi-modal Large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · 3D Shape Modeling and Analysis

MethodsALIGN