OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains
Yixuan Zhang, Hui Yang, Chuanchen Luo, Junran Peng, Yuxi Wang,, Zhaoxiang Zhang

TL;DR
This paper introduces OOD-HOI, a novel text-driven framework that generates realistic 3D whole-body human-object interactions capable of generalizing to new objects and actions, addressing data scarcity and physical plausibility challenges.
Contribution
The paper proposes a dual-branch diffusion model with contact-guided refinement and dynamic adaptation for robust, out-of-domain 3D human-object interaction generation from text.
Findings
Outperforms existing methods in realism and physical plausibility
Effective in out-of-domain scenarios with new objects and actions
Demonstrates robustness and generalization in 3D interaction synthesis
Abstract
Generating realistic 3D human-object interactions (HOIs) from text descriptions is a active research topic with potential applications in virtual and augmented reality, robotics, and animation. However, creating high-quality 3D HOIs remains challenging due to the lack of large-scale interaction data and the difficulty of ensuring physical plausibility, especially in out-of-domain (OOD) scenarios. Current methods tend to focus either on the body or the hands, which limits their ability to produce cohesive and realistic interactions. In this paper, we propose OOD-HOI, a text-driven framework for generating whole-body human-object interactions that generalize well to new objects and actions. Our approach integrates a dual-branch reciprocal diffusion model to synthesize initial interaction poses, a contact-guided interaction refiner to improve physical accuracy based on predicted contact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Hand Gesture Recognition Systems
MethodsFocus · Diffusion
