Reference-Free Image Quality Assessment for Virtual Try-On via Human Feedback

Yuki Hirakawa; Takashi Wada; Ryotaro Shimizu; Takuya Furusawa; Yuki Saito; Ryosuke Araki; Tianwei Chen; Fan Mo; Yoshimitsu Aoki

arXiv:2603.13057·cs.CV·March 16, 2026

Reference-Free Image Quality Assessment for Virtual Try-On via Human Feedback

Yuki Hirakawa, Takashi Wada, Ryotaro Shimizu, Takuya Furusawa, Yuki Saito, Ryosuke Araki, Tianwei Chen, Fan Mo, Yoshimitsu Aoki

PDF

Open Access

TL;DR

This paper introduces VTON-IQA, a reference-free, human-aligned image quality assessment framework for virtual try-on systems, supported by a large-scale annotated benchmark, enabling reliable evaluation without ground-truth images.

Contribution

The paper presents VTON-IQA, the first large-scale human-annotated benchmark for virtual try-on quality assessment and a novel transformer-based model with cross-attention for perceptual quality prediction.

Findings

01

VTON-IQA achieves reliable human-aligned quality predictions.

02

Benchmark evaluation reveals strengths and weaknesses of 14 VTON models.

03

The dataset contains over 62,000 annotated try-on images.

Abstract

Given a person image and a garment image, image-based Virtual Try-ON (VTON) synthesizes a try-on image of the person wearing the target garment. As VTON systems become increasingly important in practical applications such as fashion e-commerce, reliable evaluation of their outputs has emerged as a critical challenge. In real-world scenarios, ground-truth images of the same person wearing the target garment are typically unavailable, making reference-based evaluation impractical. Moreover, widely used distribution-level metrics such as Fr\'echet Inception Distance and Kernel Inception Distance measure dataset-level similarity and fail to reflect the perceptual quality of individual generated images. To address these limitations, we propose Image Quality Assessment for Virtual Try-On (VTON-IQA), a reference-free framework for human-aligned, image-level quality assessment without requiring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis