What should I wear to a party in a Greek taverna? Evaluation for Conversational Agents in the Fashion Domain
Antonis Maronikolakis, Ana Peleteiro Ramallo, Weiwei Cheng, Thomas, Kober

TL;DR
This paper introduces a multilingual dataset of 4,000 fashion-related conversations to evaluate large language models' ability to serve as conversational agents in online fashion retail, focusing on their capacity to interact with backend systems.
Contribution
It presents a new high-quality, multilingual dataset for evaluating LLMs in fashion e-commerce and demonstrates its utility in assessing models' capabilities for practical deployment.
Findings
The dataset effectively scales to business needs.
LLMs show varying performance in calling backend systems.
The dataset facilitates iterative development of conversational tools.
Abstract
Large language models (LLMs) are poised to revolutionize the domain of online fashion retail, enhancing customer experience and discovery of fashion online. LLM-powered conversational agents introduce a new way of discovery by directly interacting with customers, enabling them to express in their own ways, refine their needs, obtain fashion and shopping advice that is relevant to their taste and intent. For many tasks in e-commerce, such as finding a specific product, conversational agents need to convert their interactions with a customer to a specific call to different backend systems, e.g., a search system to showcase a relevant set of products. Therefore, evaluating the capabilities of LLMs to perform those tasks related to calling other services is vital. However, those evaluations are generally complex, due to the lack of relevant and high quality datasets, and do not align…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Social Robot Interaction and HRI
MethodsSparse Evolutionary Training · ALIGN
