Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for   Large-Scale Product Retrieval Evaluation

Kasra Hosseini; Thomas Kober; Josip Krapac; Roland Vollgraf; Weiwei; Cheng; Ana Peleteiro Ramallo

arXiv:2409.11860·cs.IR·September 19, 2024

Retrieve, Annotate, Evaluate, Repeat: Leveraging Multimodal LLMs for Large-Scale Product Retrieval Evaluation

Kasra Hosseini, Thomas Kober, Josip Krapac, Roland Vollgraf, Weiwei, Cheng, Ana Peleteiro Ramallo

PDF

Open Access

TL;DR

This paper presents a framework using multimodal large language models to automate and scale the evaluation of product retrieval systems in e-commerce, achieving quality comparable to human annotations while reducing costs.

Contribution

It introduces a novel multimodal LLM-based approach for large-scale product retrieval evaluation, including generating annotation guidelines and conducting annotations, validated on a real e-commerce platform.

Findings

01

Comparable annotation quality to humans

02

Significant reduction in time and cost

03

Effective for large-scale quality control

Abstract

Evaluating production-level retrieval systems at scale is a crucial yet challenging task due to the limited availability of a large pool of well-trained human annotators. Large Language Models (LLMs) have the potential to address this scaling issue and offer a viable alternative to humans for the bulk of annotation tasks. In this paper, we propose a framework for assessing the product search engines in a large-scale e-commerce setting, leveraging Multimodal LLMs for (i) generating tailored annotation guidelines for individual queries, and (ii) conducting the subsequent annotation task. Our method, validated through deployment on a large e-commerce platform, demonstrates comparable quality to human annotations, significantly reduces time and cost, facilitates rapid problem discovery, and provides an effective solution for production-level quality control at scale.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Web Data Mining and Analysis · Topic Modeling