DAViD: Modeling Dynamic Affordance of 3D Objects Using Pre-trained Video Diffusion Models

Hyeonwoo Kim; Sangwon Baik; Hanbyul Joo

arXiv:2501.08333·cs.CV·August 12, 2025

DAViD: Modeling Dynamic Affordance of 3D Objects Using Pre-trained Video Diffusion Models

Hyeonwoo Kim, Sangwon Baik, Hanbyul Joo

PDF

Open Access

TL;DR

This paper introduces DAViD, a novel framework that models dynamic human-object interactions in 3D by generating synthetic 4D samples from 2D videos using pre-trained diffusion models, enabling better understanding and synthesis of motion patterns.

Contribution

The paper presents a new pipeline for learning 4D human-object interaction models using synthetic data and introduces a LoRA-enhanced diffusion model for capturing dynamic affordance in 3D objects.

Findings

01

DAViD outperforms baselines in HOI motion synthesis.

02

The pipeline effectively integrates new HOI concepts with pre-trained motions.

03

Synthetic 4D samples enable learning from limited data.

Abstract

Modeling how humans interact with objects is crucial for AI to effectively assist or mimic human behaviors. Existing studies for learning such ability primarily focus on static human-object interaction (HOI) patterns, such as contact and spatial relationships, while dynamic HOI patterns, capturing the movement of humans and objects over time, remain relatively underexplored. In this paper, we present a novel framework for learning Dynamic Affordance across various target object categories. To address the scarcity of 4D HOI datasets, our method learns the 3D dynamic affordance from synthetically generated 4D HOI samples. Specifically, we propose a pipeline that first generates 2D HOI videos from a given 3D target object using a pre-trained video diffusion model, then lifts them into 3D to generate 4D HOI samples. Leveraging these synthesized 4D HOI samples, we train DAViD, our generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis

MethodsDiffusion · Focus