Leveraging LLMs to Evaluate Usefulness of Document

Xingzhu Wang; Erhan Zhang; Yiqun Chen; Jinghan Xuan; Yucheng Hou; Yitong Xu; Ying Nie; Shuaiqiang Wang; Dawei Yin; Jiaxin Mao

arXiv:2506.08626·cs.IR·June 12, 2025

Leveraging LLMs to Evaluate Usefulness of Document

Xingzhu Wang, Erhan Zhang, Yiqun Chen, Jinghan Xuan, Yucheng Hou, Yitong Xu, Ying Nie, Shuaiqiang Wang, Dawei Yin, Jiaxin Mao

PDF

Open Access

TL;DR

This paper presents a novel user-centric evaluation framework leveraging LLMs to generate multilevel usefulness labels by incorporating user context and behavior, improving relevance assessment and satisfaction prediction.

Contribution

It introduces a new LLM-based evaluation method that integrates user context and behavioral data for more accurate usefulness assessment, surpassing traditional relevance labeling.

Findings

01

LLMs can accurately evaluate usefulness with proper context guidance

02

The approach outperforms third-party labeling methods in usefulness assessment

03

Use of generated labels improves satisfaction prediction models

Abstract

The conventional Cranfield paradigm struggles to effectively capture user satisfaction due to its weak correlation between relevance and satisfaction, alongside the high costs of relevance annotation in building test collections. To tackle these issues, our research explores the potential of leveraging large language models (LLMs) to generate multilevel usefulness labels for evaluation. We introduce a new user-centric evaluation framework that integrates users' search context and behavioral data into LLMs. This framework uses a cascading judgment structure designed for multilevel usefulness assessments, drawing inspiration from ordinal regression techniques. Our study demonstrates that when well-guided with context and behavioral information, LLMs can accurately evaluate usefulness, allowing our approach to surpass third-party labeling methods. Furthermore, we conduct ablation studies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Recommender Systems and Techniques · Text Readability and Simplification