Open-Vocabulary Functional 3D Human-Scene Interaction Generation

Jie Liu; Yu Sun; Alpar Cseke; Yao Feng; Nicolas Heron; Michael J. Black; Yan Zhang

arXiv:2601.20835·cs.CV·February 2, 2026

Open-Vocabulary Functional 3D Human-Scene Interaction Generation

Jie Liu, Yu Sun, Alpar Cseke, Yao Feng, Nicolas Heron, Michael J. Black, Yan Zhang

PDF

Open Access

TL;DR

This paper introduces FunHSI, a framework that generates functionally correct 3D human-scene interactions from open-vocabulary prompts by reasoning about object functionality and contact, improving plausibility and diversity.

Contribution

FunHSI is a training-free, functionality-driven method that explicitly models object functions and contact reasoning to produce realistic 3D human-scene interactions from open prompts.

Findings

01

Generates more plausible 3D human-scene interactions than existing methods.

02

Supports fine-grained functional interactions like adjusting room temperature.

03

Works across diverse indoor and outdoor scenes.

Abstract

Generating 3D humans that functionally interact with 3D scenes remains an open problem with applications in embodied AI, robotics, and interactive content creation. The key challenge involves reasoning about both the semantics of functional elements in 3D scenes and the 3D human poses required to achieve functionality-aware interaction. Unfortunately, existing methods typically lack explicit reasoning over object functionality and the corresponding human-scene contact, resulting in implausible or functionally incorrect interactions. In this work, we propose FunHSI, a training-free, functionality-driven framework that enables functionally correct human-scene interactions from open-vocabulary task prompts. Given a task prompt, FunHSI performs functionality-aware contact reasoning to identify functional scene elements, reconstruct their 3D geometry, and model high-level interactions via a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · 3D Shape Modeling and Analysis · Human Motion and Animation