Coordinates of Capability: A Unified MTMM-Geometric Framework for LLM Evaluation
Adib Sakhawat, Tahsin Islam, Takia Farhin, Syed Rifat Raiyan, Hasan Mahmud, and Md Kamrul Hasan

TL;DR
This paper introduces a geometric MTMM framework for LLM evaluation that unifies multiple metrics into a shared latent space, improving construct validity and robustness.
Contribution
It formalizes nine evaluation metrics as geometric measurements within a shared space, separating task-irrelevant factors from true capabilities.
Findings
Unifies nine metrics into a shared geometric framework
Identifies three orthogonal latent dimensions of model behavior
Provides a domain-agnostic taxonomy for robust benchmarking
Abstract
The evaluation of Large Language Models (LLMs) faces a critical challenge in construct validity, where fragmented benchmarks and ad hoc metrics frequently conflate method variance, such as prompt sensitivity, with true latent capabilities. Concurrently, emerging research suggests that LLM capabilities and outputs can be modeled as continuous geometric manifolds. In this Systematization of Knowledge (SoK), we bridge these paradigms by proposing a generalized Multi-Trait Multi-Method (MTMM) framework for LLM evaluation. We formalize and unify nine evaluation metrics, including Paraphrase Instability, Drift Score, Overton Width, and Pluralism Score, interpreting them not as isolated scalar values but as geometric measurements within a shared latent coordinate space. This spatial unification factorizes model behavior into three orthogonal latent dimensions: (1) Instability and Sensitivity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
