AnchorHOI: Zero-shot Generation of 4D Human-Object Interaction via Anchor-based Prior Distillation

Sisi Dai; Kai Xu

arXiv:2512.14095·cs.CV·December 17, 2025

AnchorHOI: Zero-shot Generation of 4D Human-Object Interaction via Anchor-based Prior Distillation

Sisi Dai, Kai Xu

PDF

Open Access 1 Video

TL;DR

AnchorHOI introduces a novel zero-shot 4D human-object interaction generation framework that leverages hybrid priors and anchor-based distillation, significantly improving diversity and generalization over previous methods.

Contribution

The paper proposes AnchorHOI, a new framework that uses anchor-based prior distillation with video diffusion models for scalable zero-shot 4D HOI generation.

Findings

01

Outperforms previous methods in diversity and generalization

02

Effectively incorporates interaction-aware anchors for realistic motion synthesis

03

Demonstrates superior results through extensive experiments

Abstract

Despite significant progress in text-driven 4D human-object interaction (HOI) generation with supervised methods, the scalability remains limited by the scarcity of large-scale 4D HOI datasets. To overcome this, recent approaches attempt zero-shot 4D HOI generation with pre-trained image diffusion models. However, interaction cues are minimally distilled during the generation process, restricting their applicability across diverse scenarios. In this paper, we propose AnchorHOI, a novel framework that thoroughly exploits hybrid priors by incorporating video diffusion models beyond image diffusion models, advancing 4D HOI generation. Nevertheless, directly optimizing high-dimensional 4D HOI with such priors remains challenging, particularly for human pose and compositional motion. To address this challenge, AnchorHOI introduces an anchor-based prior distillation strategy, which constructs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AnchorHOI: Zero-shot Generation of 4D Human-Object Interaction via Anchor-based Prior Distillation· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition