HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image

Junyi Guo; Jingxuan Zhang; Fangyu Wu; Huanda Lu; Qiufeng Wang; Wenmian Yang; Eng Gee Lim; Dongming Lu

arXiv:2505.23186·cs.CV·August 12, 2025

HiGarment: Cross-modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image

Junyi Guo, Jingxuan Zhang, Fangyu Wu, Huanda Lu, Qiufeng Wang, Wenmian Yang, Eng Gee Lim, Dongming Lu

PDF

Open Access

TL;DR

HiGarment introduces a novel diffusion-based framework for converting flat sketches into realistic garment images by integrating textual guidance and flat sketches, addressing challenges of fabric detail preservation and attribute consistency.

Contribution

This paper presents HiGarment, a new model with multi-modal enhancement and cross-attention mechanisms, along with the largest dataset for garment generation, advancing the realism and controllability of garment synthesis.

Findings

01

Effective synthesis of realistic garments from sketches and text.

02

Outperforms existing methods in fabric detail preservation.

03

User studies confirm high quality and controllability.

Abstract

Diffusion-based garment synthesis tasks primarily focus on the design phase in the fashion domain, while the garment production process remains largely underexplored. To bridge this gap, we introduce a new task: Flat Sketch to Realistic Garment Image (FS2RG), which generates realistic garment images by integrating flat sketches and textual guidance. FS2RG presents two key challenges: 1) fabric characteristics are solely guided by textual prompts, providing insufficient visual supervision for diffusion-based models, which limits their ability to capture fine-grained fabric details; 2) flat sketches and textual guidance may provide conflicting information, requiring the model to selectively preserve or modify garment attributes while maintaining structural coherence. To tackle this task, we propose HiGarment, a novel framework that comprises two core components: i) a multi-modal semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis

MethodsFocus