Controllable Human-centric Keyframe Interpolation with Generative Prior

Zujin Guo; Size Wu; Zhongang Cai; Wei Li; Chen Change Loy

arXiv:2506.03119·cs.CV·January 1, 2026

Controllable Human-centric Keyframe Interpolation with Generative Prior

Zujin Guo, Size Wu, Zhongang Cai, Wei Li, Chen Change Loy

PDF

Open Access

TL;DR

This paper introduces PoseFuse3D-KI, a novel framework that enhances human-centric keyframe interpolation by integrating 3D human guidance signals into diffusion models, resulting in more accurate and controllable intermediate frame generation.

Contribution

We propose PoseFuse3D-KI, a new method that incorporates 3D human geometry into diffusion-based interpolation, improving plausibility and control over complex articulated motions.

Findings

01

Outperforms state-of-the-art baselines with 9% higher PSNR

02

Achieves 38% reduction in LPIPS, indicating better visual quality

03

Demonstrates improved interpolation fidelity through comprehensive ablations

Abstract

Existing interpolation methods use pre-trained video diffusion priors to generate intermediate frames between sparsely sampled keyframes. In the absence of 3D geometric guidance, these methods struggle to produce plausible results for complex, articulated human motions and offer limited control over the synthesized dynamics. In this paper, we introduce PoseFuse3D Keyframe Interpolator (PoseFuse3D-KI), a novel framework that integrates 3D human guidance signals into the diffusion process for Controllable Human-centric Keyframe Interpolation (CHKI). To provide rich spatial and structural cues for interpolation, our PoseFuse3D, a 3D-informed control model, features a novel SMPL-X encoder that transforms 3D geometry and shape into the 2D latent conditioning space, alongside a fusion network that integrates these 3D cues with 2D pose embeddings. For evaluation, we build CHKI-Video, a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Robotics and Automated Systems

MethodsDiffusion