Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis
Chihiro Watanabe, Jingyu Sun

TL;DR
This paper introduces an ECDF-based evaluation framework and clustering method for analyzing the distributional quality of LLM-generated responses, providing deeper insights than traditional accuracy metrics.
Contribution
It presents a novel ECDF-based response evaluation method and a clustering approach to analyze LLM response distributions, revealing nuanced differences in agent configurations.
Findings
ECDFs distinguish between agent settings with similar accuracy but different response qualities.
Clustering of ECDFs reveals interpretable group structures related to temperature, persona, and topics.
The framework offers more nuanced insights into LLM response quality than traditional metrics.
Abstract
Large language models (LLMs) are increasingly used as agents to solve complex tasks such as question answering (QA), scientific debate, and software development. A standard evaluation procedure aggregates multiple responses from LLM agents into a single final answer, often via majority voting, and compares it against reference answers. However, this process can obscure the quality and distributional characteristics of the original responses. In this paper, we propose a novel evaluation framework based on the empirical cumulative distribution function (ECDF) of cosine similarities between generated responses and reference answers. This enables a more nuanced assessment of response quality beyond exact match metrics. To analyze the response distributions across different agent configurations, we further introduce a clustering method for ECDFs using their distances and the -medoids…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Computational and Text Analysis Methods
