CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from   Unbounded Synthesized Images

Sookwan Han; Hanbyul Joo

arXiv:2308.12288·cs.CV·September 6, 2023

CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images

Sookwan Han, Hanbyul Joo

PDF

Open Access 1 Video

TL;DR

This paper introduces CHORUS, a self-supervised method that learns 3D human-object spatial relations from synthesized images generated by a text-to-image model, overcoming annotation challenges and enabling scalable spatial reasoning.

Contribution

It is the first to utilize a generative image model for learning 3D human-object spatial relations and proposes a comprehensive framework for reasoning from synthetic 2D cues.

Findings

01

Synthesized images are sufficient for learning 3D spatial relations.

02

The method effectively disambiguates interaction types via semantic clustering.

03

A new metric evaluates 3D spatial learning quality.

Abstract

We present a method for teaching machines to understand and model the underlying spatial common sense of diverse human-object interactions in 3D in a self-supervised way. This is a challenging task, as there exist specific manifolds of the interactions that can be considered human-like and natural, but the human pose and the geometry of objects can vary even for similar interactions. Such diversity makes the annotating task of 3D interactions difficult and hard to scale, which limits the potential to reason about that in a supervised way. One way of learning the 3D spatial relationship between humans and objects during interaction is by showing multiple 2D images captured from different viewpoints when humans interact with the same type of objects. The core idea of our method is to leverage a generative model that produces high-quality 2D images from an arbitrary text prompt input as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CHORUS : Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images· youtube

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning