WAS-VTON: Warping Architecture Search for Virtual Try-on Network
Zhenyu Xie, Xujie Zhang, Fuwei Zhao, Haoye Dong, Michael C., Kampffmeyer, Haonan Yan, Xiaodan Liang

TL;DR
WAS-VTON introduces a neural architecture search approach to find clothing category-specific warping networks for virtual try-on, resulting in more natural and accurate try-on images.
Contribution
The paper proposes a novel NAS-based framework that automatically designs warping and fusion modules tailored for different clothing categories in virtual try-on.
Findings
Outperforms previous fixed-architecture methods in naturalness and accuracy
Achieves better clothing-person alignment through optimized warping networks
Produces more seamless and realistic virtual try-on results
Abstract
Despite recent progress on image-based virtual try-on, current methods are constraint by shared warping networks and thus fail to synthesize natural try-on results when faced with clothing categories that require different warping operations. In this paper, we address this problem by finding clothing category-specific warping networks for the virtual try-on task via Neural Architecture Search (NAS). We introduce a NAS-Warping Module and elaborately design a bilevel hierarchical search space to identify the optimal network-level and operation-level flow estimation architecture. Given the network-level search space, containing different numbers of warping blocks, and the operation-level search space with different convolution operations, we jointly learn a combination of repeatable warping cells and convolution operations specifically for the clothing-person alignment. Moreover, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition
MethodsConvolution
