Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Fangzhou Wu; Sandeep Silwal; Qiuyi Zhang

arXiv:2605.17110·cs.AI·May 19, 2026

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Fangzhou Wu, Sandeep Silwal, Qiuyi Zhang

PDF

TL;DR

This paper introduces ECC, a novel clustering algorithm that calibrates semantic embeddings with model comparisons to better reflect latent LLM capabilities, improving evaluation and query routing.

Contribution

ECC is a new capability-aware clustering method that aligns semantic embeddings with model performance, enabling more accurate LLM capability assessment.

Findings

01

ECC outperforms human-labeled baselines by 17.64 percentage points.

02

ECC surpasses embedding-based baselines by 18.02 percentage points.

03

ECC improves LLM capability ranking and query routing tasks.

Abstract

Query clustering organizes queries into groups that reflect shared latent capability demands, enabling capability-aware LLM evaluation. Existing clustering methods, which primarily rely on semantic taxonomies or embeddings, often fail to capture such latent capability requirements due to a misalignment between surface-level semantics and actual model performance. We propose ECC, an algorithm that calibrates prior semantic embeddings using limited posterior model comparisons to bridge the gap between surface-level semantics and latent capability requirements. ECC characterizes each cluster through a capability profile parameterized by a Bradley-Terry model and uses trainable mixture weights to accommodate queries with mixed capability demands, jointly learning a flexible, capability-aware clustering structure that supports query-specific inference of LLM capabilities. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.