MultiwayPAM: Multiway Partitioning Around Medoids for LLM-as-a-Judge Score Analysis
Chihiro Watanabe, Jingyu Sun

TL;DR
This paper introduces MultiwayPAM, a tensor clustering method for analyzing LLM-as-a-Judge score tensors, addressing computational costs and bias, and revealing underlying score structures across questions, answers, and evaluators.
Contribution
The paper presents a novel tensor clustering algorithm, MultiwayPAM, that estimates cluster memberships and medoids in multi-dimensional score data, improving analysis of LLM evaluation biases.
Findings
MultiwayPAM effectively clusters score data in practical datasets.
The method reveals insights into bias and structure in LLM evaluation scores.
Experimental results demonstrate the utility of MultiwayPAM in real-world scenarios.
Abstract
LLM-as-a-Judge is a flexible framework for text evaluation, which allows us to obtain scores for the quality of a given text from various perspectives by changing the prompt template. Two main challenges in using LLM-as-a-Judge are computational cost of LLM inference, especially when evaluating a large number of texts, and inherent bias of an LLM evaluator. To address these issues and reveal the structure of score bias caused by an LLM evaluator, we propose to apply a tensor clustering method to a given LLM-as-a-Judge score tensor, whose entries are the scores for different combinations of questions, answerers, and evaluators. Specifically, we develop a new tensor clustering method MultiwayPAM, with which we can simultaneously estimate the cluster membership and the medoids for each mode of a given data tensor. By observing the medoids obtained by MultiwayPAM, we can gain knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Text and Document Classification Technologies
