Improving Human Image Animation via Semantic Representation Alignment
Chang Liu, Mengting Chen, Yixuan Huang, Haoning Wu, Chen Ju, Shuai Xiao, Jinsong Lan, Yanfeng Wang

TL;DR
This paper introduces SemanticREPA, a novel method for human image animation that uses semantic representation alignment to improve structure coherence and identity consistency in generated videos.
Contribution
SemanticREPA leverages semantic representations as supervision signals via alignment modules, enhancing structure and ID consistency without reducing generation flexibility.
Findings
Achieves more coherent and stable human structures in generated videos.
Improves identity restoration and character consistency in extended motions.
Outperforms existing methods in quality on long video sequences.
Abstract
The field of image-to-video generation has made remarkable progress. However, challenges such as human limb twisting and facial distortion persist, especially when generating long videos or modeling intensive motions. Existing human image animation works address these issues by incorporating human-specific semantic representations, e.g., dense poses or ID embeddings, as additional conditions. However, conditioning on these representations could decrease the generation flexibility. Moreover, their reliance on RGB pixel supervision also lacks emphasis on learning necessary 3D geometric relationships and temporal coherence. In contrast, we introduce a novel approach named SemanticREPA that leverages these semantic representations as supervision signals through representation alignment. Specifically, we begin by training a structure alignment module that aligns the structure representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
