Leveraging LLMs to Evaluate Usefulness of Document
Xingzhu Wang, Erhan Zhang, Yiqun Chen, Jinghan Xuan, Yucheng Hou, Yitong Xu, Ying Nie, Shuaiqiang Wang, Dawei Yin, Jiaxin Mao

TL;DR
This paper presents a novel user-centric evaluation framework leveraging LLMs to generate multilevel usefulness labels by incorporating user context and behavior, improving relevance assessment and satisfaction prediction.
Contribution
It introduces a new LLM-based evaluation method that integrates user context and behavioral data for more accurate usefulness assessment, surpassing traditional relevance labeling.
Findings
LLMs can accurately evaluate usefulness with proper context guidance
The approach outperforms third-party labeling methods in usefulness assessment
Use of generated labels improves satisfaction prediction models
Abstract
The conventional Cranfield paradigm struggles to effectively capture user satisfaction due to its weak correlation between relevance and satisfaction, alongside the high costs of relevance annotation in building test collections. To tackle these issues, our research explores the potential of leveraging large language models (LLMs) to generate multilevel usefulness labels for evaluation. We introduce a new user-centric evaluation framework that integrates users' search context and behavioral data into LLMs. This framework uses a cascading judgment structure designed for multilevel usefulness assessments, drawing inspiration from ordinal regression techniques. Our study demonstrates that when well-guided with context and behavioral information, LLMs can accurately evaluate usefulness, allowing our approach to surpass third-party labeling methods. Furthermore, we conduct ablation studies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Recommender Systems and Techniques · Text Readability and Simplification
