LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing

Federico Girella; Davide Talon; Ziyue Liu; Zanxi Ruan; Yiming Wang; Marco Cristani

arXiv:2507.22627·cs.CV·September 5, 2025

LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing

Federico Girella, Davide Talon, Ziyue Liu, Zanxi Ruan, Yiming Wang, Marco Cristani

PDF

TL;DR

This paper introduces LOTS, a novel method for fashion image generation that combines sketch and text conditioning to produce highly customizable fashion outlooks, leveraging a new dataset and diffusion guidance.

Contribution

The paper presents a new approach that integrates localized sketch-text conditioning with diffusion models for fashion image synthesis, along with a new dataset, Sketchy, for training and evaluation.

Findings

01

Achieves state-of-the-art performance on global and localized metrics

02

Enables unprecedented levels of design customization

03

Demonstrates effective multi-condition diffusion guidance

Abstract

Fashion design is a complex creative process that blends visual and textual expressions. Designers convey ideas through sketches, which define spatial structure and design elements, and textual descriptions, capturing material, texture, and stylistic details. In this paper, we present LOcalized Text and Sketch for fashion image generation (LOTS), an approach for compositional sketch-text based generation of complete fashion outlooks. LOTS leverages a global description with paired localized sketch + text information for conditioning and introduces a novel step-based merging strategy for diffusion adaptation. First, a Modularized Pair-Centric representation encodes sketches and text into a shared latent space while preserving independent localized features; then, a Diffusion Pair Guidance phase integrates both local and global conditioning via attention-based guidance within the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.