GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting

Yukang Cao; Masoud Hadi; Liang Pan; Ziwei Liu

arXiv:2410.05259·cs.CV·October 8, 2024

GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting

Yukang Cao, Masoud Hadi, Liang Pan, Ziwei Liu

PDF

Open Access 3 Reviews

TL;DR

GS-VTON introduces a novel 3D virtual try-on method leveraging Gaussian Splatting and diffusion models, enabling consistent, high-quality 3D clothing transfer from 2D models with improved cross-view coherence.

Contribution

The paper presents a new 3D VTON framework using 3D Gaussian Splatting, LoRA fine-tuning, and a reference-driven editing approach, advancing 3D virtual try-on technology.

Findings

01

Achieves superior fidelity in 3D clothing transfer.

02

Ensures cross-view consistency and high-quality geometry.

03

Establishes a new benchmark for 3D VTON evaluation.

Abstract

Diffusion-based 2D virtual try-on (VTON) techniques have recently demonstrated strong performance, while the development of 3D VTON has largely lagged behind. Despite recent advances in text-guided 3D scene editing, integrating 2D VTON into these pipelines to achieve vivid 3D VTON remains challenging. The reasons are twofold. First, text prompts cannot provide sufficient details in describing clothing. Second, 2D VTON results generated from different viewpoints of the same 3D scene lack coherence and spatial relationships, hence frequently leading to appearance inconsistencies and geometric distortions. To resolve these problems, we introduce an image-prompted 3D VTON method (dubbed GS-VTON) which, by leveraging 3D Gaussian Splatting (3DGS) as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. (1)…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 2

Strengths

+ Effectively bridges the gap between 2D VTON and 3D applications by incorporating 3D Gaussian Splatting, which ensures consistency across multi-view images. + Uses a personalized diffusion model with LoRA fine-tuning, improving adaptability and customization for different subjects and garments. + Presents a new benchmark, 3D-VTONBench, which is an important addition for the comprehensive evaluation of 3D VTON performance. They also demonstrates superior performance over existing methods, partic

Weaknesses

- The model is a very straightforward follow-up of 2D VTON models and inherits some biases from pre-trained 2D VTON models.

Reviewer 02Rating 5Confidence 4

Strengths

1. GS-VTON is the first 3D virtual try-on method, showing more diverse real-world applications compared to 2D virtual try-on. It holds promising potential to transform online shopping and create positive social impact. 2. GS-VTON utilizes Reference-driven Image Editing and 3D Gaussian editing to ensure the try-on scene is consistent in both texture and geometry across multiple views. The design seems sound.

Weaknesses

1) **Reference-driven Image Editing**: The authors propose this method to ensure texture consistency across multi-view images by integrating attention features from a reference image. However, if the reference image has incorrect textures, it may negatively affect the consistency of subsequent images. 2) **Questionable Experimental Setting**: - All benchmark methods use text as input, while GS-VTON uses an image as a prompt. However, the user study criterion of clothing image similarity may

Reviewer 03Rating 6Confidence 5

Strengths

* The paper presents a groundbreaking method for 3D virtual try-on by extending pre-trained 2D VTON models to 3D using 3DGS, addressing the challenge of cross-view consistency and spatial relationships in 3D scenes. * The establishment of the 3D-VTONBench dataset is a valuable resource for the research community, facilitating more comprehensive evaluations and fostering further advancements in 3D VTON. * The method demonstrates superior performance over existing techniques.

Weaknesses

* In some cases, such as the first row in Fig. 1, there are noticeable artifacts on the sleeves and edges of the garments. * The statements about the effects of persona-aware 3DGS editing are inconsistent between the abstract and introduction. The abstract states "maintain consistent cross-view appearance," while the introduction says "enhancing multi-view consistency." * What are the differences or advantages of your methods compared to RelFill in "Personalized Diffusion Model via LoRA fine-tu

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques

MethodsDiffusion