InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement

Yude Zou; Junji Gong; Xing Gao; Zixuan Li; Tianxing Chen; Guanjie Zheng

arXiv:2604.04843·cs.CV·April 7, 2026

InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement

Yude Zou, Junji Gong, Xing Gao, Zixuan Li, Tianxing Chen, Guanjie Zheng

PDF

1 Repo 1 Video

TL;DR

InfBaGel introduces a novel framework for human-object-scene interaction generation that combines dynamic perception, iterative refinement, and hybrid training to produce realistic, consistent interactions in complex scenes.

Contribution

The paper presents a new coarse-to-fine, instruction-conditioned generation framework with dynamic perception and bump-aware guidance, addressing data scarcity and improving interaction realism.

Findings

01

Achieves state-of-the-art performance in HOSI and HOI generation.

02

Demonstrates strong generalization to unseen scenes.

03

Enables real-time interaction generation without detailed scene geometry.

Abstract

Human-object-scene interactions (HOSI) generation has broad applications in embodied AI, simulation, and animation. Unlike human-object interaction (HOI) and human-scene interaction (HSI), HOSI generation requires reasoning over dynamic object-scene changes, yet suffers from limited annotated data. To address these issues, we propose a coarse-to-fine instruction-conditioned interaction generation framework that is explicitly aligned with the iterative denoising process of a consistency model. In particular, we adopt a dynamic perception strategy that leverages trajectories from the preceding refinement to update scene context and condition subsequent refinement at each denoising step of consistency model, yielding consistent interactions. To further reduce physical artifacts, we introduce a bump-aware guidance that mitigates collisions and penetrations during sampling without requiring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://yudezou.github.io/InfBaGel-page
github

Videos

InfBaGel: Human-Object-Scene Interaction Generation with Dynamic Perception and Iterative Refinement· slideslive