GarmentAligner: Text-to-Garment Generation via Retrieval-augmented   Multi-level Corrections

Shiyue Zhang; Zheng Chong; Xujie Zhang; Hanhui Li; Yuhao Cheng,; Yiqiang Yan; and Xiaodan Liang

arXiv:2408.12352·cs.CV·August 26, 2024

GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections

Shiyue Zhang, Zheng Chong, Xujie Zhang, Hanhui Li, Yuhao Cheng,, Yiqiang Yan, and Xiaodan Liang

PDF

Open Access

TL;DR

GarmentAligner is a novel diffusion model that improves text-to-garment image generation by using retrieval-augmented multi-level corrections for fine-grained semantic and component alignment.

Contribution

It introduces an automatic component extraction pipeline and multi-level correction losses to enhance semantic, spatial, and quantitative alignment in garment generation.

Findings

01

Achieves superior fidelity compared to existing models.

02

Enhances fine-grained semantic alignment of garment components.

03

Utilizes retrieval augmentation and contrastive learning for better perception.

Abstract

General text-to-image models bring revolutionary innovation to the fields of arts, design, and media. However, when applied to garment generation, even the state-of-the-art text-to-image models suffer from fine-grained semantic misalignment, particularly concerning the quantity, position, and interrelations of garment components. Addressing this, we propose GarmentAligner, a text-to-garment diffusion model trained with retrieval-augmented multi-level corrections. To achieve semantic alignment at the component level, we introduce an automatic component extraction pipeline to obtain spatial and quantitative information of garment components from corresponding images and captions. Subsequently, to exploit component relationships within the garment images, we construct retrieval subsets for each garment by retrieval augmentation based on component-level similarity ranking and conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Human Motion and Animation

MethodsDiffusion · Contrastive Learning