OneHOI: Unifying Human-Object Interaction Generation and Editing

Jiun Tian Hoe; Weipeng Hu; Xudong Jiang; Yap-Peng Tan; Chee Seng Chan

arXiv:2604.14062·cs.CV·April 16, 2026

OneHOI: Unifying Human-Object Interaction Generation and Editing

Jiun Tian Hoe, Weipeng Hu, Xudong Jiang, Yap-Peng Tan, Chee Seng Chan

PDF

2 Repos 1 Models 2 Datasets

TL;DR

OneHOI is a unified diffusion transformer framework that advances human-object interaction generation and editing by modeling relations and disentangling multiple interactions, achieving state-of-the-art results.

Contribution

It introduces a single conditional denoising process that combines HOI generation and editing with shared structured representations and novel attention mechanisms.

Findings

01

Achieves state-of-the-art results in HOI generation and editing.

02

Supports diverse control conditions including layout-guided and arbitrary masks.

03

Effectively models multi-HOI scenes with disentangled representations.

Abstract

Human-Object Interaction (HOI) modelling captures how humans act upon and relate to objects, typically expressed as <person, action, object> triplets. Existing approaches split into two disjoint families: HOI generation synthesises scenes from structured triplets and layout, but fails to integrate mixed conditions like HOI and object-only entities; and HOI editing modifies interactions via text, yet struggles to decouple pose from physical contact and scale to multiple interactions. We introduce OneHOI, a unified diffusion transformer framework that consolidates HOI generation and editing into a single conditional denoising process driven by shared structured interaction representations. At its core, the Relational Diffusion Transformer (R-DiT) models verb-mediated relations through role- and instance-aware HOI tokens, layout-based spatial Action Grounding, a Structured HOI Attention to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
jiuntian/OneHOI
model· 57 dl
57 dl

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.