TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
Jeongho Kim, Min-Jung Kim, Junsoo Lee, and Jaegul Choo

TL;DR
TCAN is a novel pose-driven human image animation method that achieves temporal consistency and robustness to pose detection errors by leveraging pre-trained diffusion models, LoRA adaptation, and a temporal layer.
Contribution
It introduces a method that uses frozen ControlNet with LoRA and a temporal layer to improve human image animation robustness and consistency without fine-tuning the pre-trained model.
Findings
Achieves high-quality, temporally consistent human video synthesis.
Robust to erroneous pose detections and outliers.
Effective across various pose scenarios like chibi.
Abstract
Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to erroneous poses and consistent over time. In contrast to previous methods, we utilize the pre-trained ControlNet without fine-tuning to leverage its extensive pre-acquired knowledge from numerous pose-image-caption pairs. To keep the ControlNet frozen, we adapt LoRA to the UNet layers, enabling the network to align the latent space between the pose and appearance features. Additionally, by introducing an additional temporal layer to the ControlNet, we enhance robustness against outliers of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage
MethodsSoftmax · Attention Is All You Need · ALIGN · Diffusion
