ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images

Xianghao Kong; Qiaosong Qi; Yuanbin Wang; Biaolong Chen; Aixi Zhang; Anyi Rao

arXiv:2505.06537·cs.CV·April 1, 2026

ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images

Xianghao Kong, Qiaosong Qi, Yuanbin Wang, Biaolong Chen, Aixi Zhang, Anyi Rao

PDF

TL;DR

ProFashion is a novel framework for fashion video generation that uses multiple reference images and advanced modules to improve view consistency and motion coherence in synthesized videos.

Contribution

It introduces Pose-aware Prototype Aggregator and Flow-enhanced Prototype Instantiator to effectively leverage multiple references and motion flow, enhancing video quality.

Findings

01

Outperforms previous methods on UBC Fashion dataset.

02

Achieves better view consistency with multiple references.

03

Enhances motion coherence through flow-guided attention.

Abstract

Fashion video generation aims to synthesize temporally consistent videos from reference images of a designated character. Despite significant progress, existing diffusion-based methods only support a single reference image as input, severely limiting their capability to generate view-consistent fashion videos, especially when there are different patterns on the clothes from different perspectives. Moreover, the widely adopted motion module does not sufficiently model human body movement, leading to sub-optimal spatiotemporal consistency. To address these issues, we propose ProFashion, a fashion video generation framework leveraging multiple reference images to achieve improved view consistency and temporal coherency. To effectively leverage features from multiple reference images while maintaining a reasonable computational cost, we devise a Pose-aware Prototype Aggregator, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.