TCAN: Animating Human Images with Temporally Consistent Pose Guidance   using Diffusion Models

Jeongho Kim; Min-Jung Kim; Junsoo Lee; and Jaegul Choo

arXiv:2407.09012·cs.CV·July 15, 2024·1 cites

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

Jeongho Kim, Min-Jung Kim, Junsoo Lee, and Jaegul Choo

PDF

Open Access

TL;DR

TCAN is a novel pose-driven human image animation method that achieves temporal consistency and robustness to pose detection errors by leveraging pre-trained diffusion models, LoRA adaptation, and a temporal layer.

Contribution

It introduces a method that uses frozen ControlNet with LoRA and a temporal layer to improve human image animation robustness and consistency without fine-tuning the pre-trained model.

Findings

01

Achieves high-quality, temporally consistent human video synthesis.

02

Robust to erroneous pose detections and outliers.

03

Effective across various pose scenarios like chibi.

Abstract

Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to erroneous poses and consistent over time. In contrast to previous methods, we utilize the pre-trained ControlNet without fine-tuning to leverage its extensive pre-acquired knowledge from numerous pose-image-caption pairs. To keep the ControlNet frozen, we adapt LoRA to the UNet layers, enabling the network to align the latent space between the pose and appearance features. Additionally, by introducing an additional temporal layer to the ControlNet, we enhance robustness against outliers of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage

MethodsSoftmax · Attention Is All You Need · ALIGN · Diffusion