ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
Xianghao Kong, Qiaosong Qi, Yuanbin Wang, Biaolong Chen, Aixi Zhang, Anyi Rao

TL;DR
ProFashion is a novel framework for fashion video generation that uses multiple reference images and advanced modules to improve view consistency and motion coherence in synthesized videos.
Contribution
It introduces Pose-aware Prototype Aggregator and Flow-enhanced Prototype Instantiator to effectively leverage multiple references and motion flow, enhancing video quality.
Findings
Outperforms previous methods on UBC Fashion dataset.
Achieves better view consistency with multiple references.
Enhances motion coherence through flow-guided attention.
Abstract
Fashion video generation aims to synthesize temporally consistent videos from reference images of a designated character. Despite significant progress, existing diffusion-based methods only support a single reference image as input, severely limiting their capability to generate view-consistent fashion videos, especially when there are different patterns on the clothes from different perspectives. Moreover, the widely adopted motion module does not sufficiently model human body movement, leading to sub-optimal spatiotemporal consistency. To address these issues, we propose ProFashion, a fashion video generation framework leveraging multiple reference images to achieve improved view consistency and temporal coherency. To effectively leverage features from multiple reference images while maintaining a reasonable computational cost, we devise a Pose-aware Prototype Aggregator, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
