An Empirical Analysis of GPT-4V's Performance on Fashion Aesthetic Evaluation
Yuki Hirakawa, Takashi Wada, Kazuya Morishita, Ryotaro Shimizu, Takuya, Furusawa, Sai Htaung Kham, Yuki Saito

TL;DR
This paper investigates GPT-4V's zero-shot ability to evaluate fashion aesthetics, showing it aligns with human judgments but has limitations in ranking similar-colored outfits.
Contribution
First empirical study of GPT-4V's performance on fashion aesthetic evaluation, highlighting its strengths and weaknesses in zero-shot settings.
Findings
GPT-4V's predictions align with human judgments
Struggles with ranking outfits of similar colors
Provides a baseline for future research
Abstract
Fashion aesthetic evaluation is the task of estimating how well the outfits worn by individuals in images suit them. In this work, we examine the zero-shot performance of GPT-4V on this task for the first time. We show that its predictions align fairly well with human judgments on our datasets, and also find that it struggles with ranking outfits in similar colors. The code is available at https://github.com/st-tech/gpt4v-fashion-aesthetic-evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConsumer Perception and Purchasing Behavior · Cultural and Historical Studies · Diverse Topics in Contemporary Research
MethodsALIGN
