Query-Document Dense Vectors for LLM Relevance Judgment Bias Analysis
Samaneh Mohtadi, Gianluca Demartini

TL;DR
This paper introduces a clustering-based framework to analyze systematic biases in LLM relevance judgments, revealing specific semantic areas where LLMs diverge from human assessments, thus improving IR evaluation reliability.
Contribution
It proposes a novel representational and clustering method to localize and understand systematic biases in LLM relevance judgments compared to humans.
Findings
Systematic disagreements are concentrated in specific semantic clusters.
LLMs often under-recall relevant content in definition and policy-related queries.
Disagreement hotspots are identified where LLMs show consistent biases.
Abstract
Large Language Models (LLMs) have been used as relevance assessors for Information Retrieval (IR) evaluation collection creation due to reduced cost and increased scalability as compared to human assessors. While previous research has looked at the reliability of LLMs as compared to human assessors, in this work, we aim to understand if LLMs make systematic mistakes when judging relevance, rather than just understanding how good they are on average. To this aim, we propose a novel representational method for queries and documents that allows us to analyze relevance label distributions and compare LLM and human labels to identify patterns of disagreement and localize systematic areas of disagreement. We introduce a clustering-based framework that embeds query-document (Q-D) pairs into a joint semantic space, treating relevance as a relational property. Experiments on TREC Deep Learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Expert finding and Q&A systems
