FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization
Chuancheng Shi, Yixiang Chen, Burong Lei, Jichao Chen

TL;DR
FashionPose is a novel framework that generates personalized fashion images from text, predicting poses, synthesizing images, and relighting, enabling flexible and realistic virtual garment visualization for e-commerce.
Contribution
It introduces the first unified text-to-pose-to-relighting framework that replaces explicit pose annotations with text-driven conditioning for enhanced flexibility.
Findings
Achieves fine-grained pose synthesis
Enables efficient and consistent relighting
Provides a practical solution for virtual fashion display
Abstract
Realistic and controllable garment visualization is critical for fashion e-commerce, where users expect personalized previews under diverse poses and lighting conditions. Existing methods often rely on predefined poses, limiting semantic flexibility and illumination adaptability. To address this, we introduce FashionPose, the first unified text-to-pose-to-relighting generation framework. Given a natural language description, our method first predicts a 2D human pose, then employs a diffusion model to generate high-fidelity person images, and finally applies a lightweight relighting module, all guided by the same textual input. By replacing explicit pose annotations with text-driven conditioning, FashionPose enables accurate pose alignment, faithful garment rendering, and flexible lighting control. Experiments demonstrate fine-grained pose synthesis and efficient, consistent relighting,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Fashion and Cultural Textiles
