The Effect of Document Summarization on LLM-Based Relevance Judgments

Samaneh Mohtadi; Kevin Roitero; Stefano Mizzaro; and Gianluca Demartini

arXiv:2512.05334·cs.IR·December 8, 2025

The Effect of Document Summarization on LLM-Based Relevance Judgments

Samaneh Mohtadi, Kevin Roitero, Stefano Mizzaro, and Gianluca Demartini

PDF

Open Access

TL;DR

This study explores how text summarization influences the reliability of LLM-based relevance judgments in IR evaluation, finding that summaries can maintain ranking stability but also introduce biases affecting judgment accuracy.

Contribution

It systematically evaluates the impact of document summarization on LLM-based relevance assessments across multiple datasets, highlighting implications for IR evaluation reliability.

Findings

01

Summary-based judgments maintain system ranking stability.

02

Summarization introduces biases and shifts in label distributions.

03

Summarization offers a more efficient approach for large-scale IR evaluation.

Abstract

Relevance judgments are central to the evaluation of Information Retrieval (IR) systems, but obtaining them from human annotators is costly and time-consuming. Large Language Models (LLMs) have recently been proposed as automated assessors, showing promising alignment with human annotations. Most prior studies have treated documents as fixed units, feeding their full content directly to LLM assessors. We investigate how text summarization affects the reliability of LLM-based judgments and their downstream impact on IR evaluation. Using state-of-the-art LLMs across multiple TREC collections, we compare judgments made from full documents with those based on LLM-generated summaries of different lengths. We examine their agreement with human labels, their effect on retrieval effectiveness evaluation, and their influence on IR systems' ranking stability. Our findings show that summary-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Expert finding and Q&A systems