TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions
Ilya A. Petrov, Riccardo Marin, Julian Chibane, Gerard Pons-Moll

TL;DR
TriDi introduces a unified three-way diffusion model for 3D human-object interaction, capable of generating human, object, and interaction data simultaneously, surpassing prior one-way models in diversity and quality.
Contribution
It is the first model to unify bidirectional 3D human-object interaction modeling using a single diffusion process and transformer architecture.
Findings
Outperforms specialized baselines on GRAB and BEHAVE datasets.
Generates diverse and high-quality 3D human-object interaction samples.
Demonstrates applicability to scene population and generalization to unseen objects.
Abstract
Modeling 3D human-object interaction (HOI) is a problem of great interest for computer vision and a key enabler for virtual and mixed-reality applications. Existing methods work in a one-way direction: some recover plausible human interactions conditioned on a 3D object; others recover the object pose conditioned on a human pose. Instead, we provide the first unified model - TriDi which works in any direction. Concretely, we generate Human, Object, and Interaction modalities simultaneously with a new three-way diffusion process, allowing to model seven distributions with one network. We implement TriDi as a transformer attending to the various modalities' tokens, thereby discovering conditional relations between them. The user can control the interaction either as a text description of HOI or a contact map. We embed these two representations into a shared latent space, combining the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis
MethodsDiffusion
