SpriteHand: Real-Time Versatile Hand-Object Interaction with Autoregressive Video Generation
Zisu Li, Hengye Lyu, Jiaxin Shi, Yufeng Zeng, Mingming Fan, Hanwang Zhang, Chen Liang

TL;DR
SpriteHand is a real-time autoregressive video generation framework that synthesizes diverse, realistic hand-object interactions across various object types and motions, surpassing traditional physics engines in flexibility and visual quality.
Contribution
The paper introduces SpriteHand, a novel autoregressive model capable of real-time, versatile hand-object interaction video synthesis with improved realism and coherence over existing methods.
Findings
Supports 18 FPS at 640x368 resolution
Achieves high visual realism and interaction fidelity
Handles a wide range of object types and motions
Abstract
Modeling and synthesizing complex hand-object interactions remains a significant challenge, even for state-of-the-art physics engines. Conventional simulation-based approaches rely on explicitly defined rigid object models and pre-scripted hand gestures, making them inadequate for capturing dynamic interactions with non-rigid or articulated entities such as deformable fabrics, elastic materials, hinge-based structures, furry surfaces, or even living creatures. In this paper, we present SpriteHand, an autoregressive video generation framework for real-time synthesis of versatile hand-object interaction videos across a wide range of object types and motion patterns. SpriteHand takes as input a static object image and a video stream in which the hands are imagined to interact with the virtual object embedded in a real-world scene, and generates corresponding hand-object interaction effects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning
