TL;DR
Dress-ED introduces a comprehensive benchmark dataset and a unified diffusion-based framework for instruction-guided virtual garment editing, enabling controllable and interactive fashion synthesis.
Contribution
It provides the first large-scale dataset combining VTON, VTOFF, and text-guided editing, along with a baseline model for instruction-driven fashion editing.
Findings
Dataset contains over 146k verified quadruplets across three garment categories.
The proposed framework effectively reasons over instructions and visual cues for editing.
Code and dataset will be publicly available.
Abstract
Recent advances in Virtual Try-On (VTON) and Virtual Try-Off (VTOFF) have greatly improved photo-realistic fashion synthesis and garment reconstruction. However, existing datasets remain static, lacking instruction-driven editing for controllable and interactive fashion generation. In this work, we introduce the Dress Editing Dataset (Dress-ED), the first large-scale benchmark that unifies VTON, VTOFF, and text-guided garment editing within a single framework. Each sample in Dress-ED includes an in-shop garment image, the corresponding person image wearing the garment, their edited counterparts, and a natural-language instruction of the desired modification. Built through a fully automated multimodal pipeline that integrates MLLM-based garment understanding, diffusion-based editing, and LLM-guided verification, Dress-ED comprises over 146k verified quadruplets spanning three garment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
