HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction   Awareness

Zihui Xue; Mi Luo; Changan Chen; Kristen Grauman

arXiv:2406.07754·cs.CV·November 12, 2024

HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

Zihui Xue, Mi Luo, Changan Chen, Kristen Grauman

PDF

Open Access 1 Video

TL;DR

HOI-Swap is a diffusion-based framework that enables realistic object swapping in videos with hand-object interaction awareness, addressing limitations of existing models in handling interaction intricacies.

Contribution

The paper introduces a self-supervised, two-stage diffusion-based method for object swapping in videos that preserves hand-object interactions and extends edits across sequences.

Findings

01

Outperforms existing methods in quality and realism.

02

Effectively preserves hand-object interaction patterns.

03

Enables controllable motion alignment in edited videos.

Abstract

We study the problem of precisely swapping objects in videos, with a focus on those interacted with by hands, given one user-provided reference object image. Despite the great advancements that diffusion models have made in video editing recently, these models often fall short in handling the intricacies of hand-object interactions (HOI), failing to produce realistic edits -- especially when object swapping results in object shape or functionality changes. To bridge this gap, we present HOI-Swap, a novel diffusion-based video editing framework trained in a self-supervised manner. Designed in two stages, the first stage focuses on object swapping in a single frame with HOI awareness; the model learns to adjust the interaction patterns, such as the hand grasp, based on changes in the object's properties. The second stage extends the single-frame edit across the entire sequence; we achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Visual Attention and Saliency Detection · Face recognition and analysis

MethodsFocus · Diffusion