Multi-stage Large Language Model Pipelines Can Outperform GPT-4o in   Relevance Assessment

Julian A. Schnabel; Johanne R. Trippas; Falk Scholer; Danula; Hettiachchi

arXiv:2501.14296·cs.IR·January 27, 2025

Multi-stage Large Language Model Pipelines Can Outperform GPT-4o in Relevance Assessment

Julian A. Schnabel, Johanne R. Trippas, Falk Scholer, Danula, Hettiachchi

PDF

Open Access

TL;DR

This paper introduces a multi-stage LLM pipeline for relevance assessment that outperforms GPT-4o in accuracy and cost-efficiency, providing a scalable alternative to human annotation.

Contribution

The authors develop a modular, multi-stage LLM pipeline that improves relevance assessment accuracy and reduces costs compared to existing GPT-4o methods.

Findings

01

Achieved 18.4% higher Krippendorff's α accuracy over GPT-4o mini.

02

Maintained low cost of about 0.2 USD per million tokens.

03

Enhanced GPT-4o's accuracy by 9.7% using the pipeline approach.

Abstract

The effectiveness of search systems is evaluated using relevance labels that indicate the usefulness of documents for specific queries and users. While obtaining these relevance labels from real users is ideal, scaling such data collection is challenging. Consequently, third-party annotators are employed, but their inconsistent accuracy demands costly auditing, training, and monitoring. We propose an LLM-based modular classification pipeline that divides the relevance assessment task into multiple stages, each utilising different prompts and models of varying sizes and capabilities. Applied to TREC Deep Learning (TREC-DL), one of our approaches showed an 18.4% Krippendorff's $α$ accuracy increase over OpenAI's GPT-4o mini while maintaining a cost of about 0.2 USD per million input tokens, offering a more efficient and scalable solution for relevance assessment. This approach beats…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)