Coordinates of Capability: A Unified MTMM-Geometric Framework for LLM Evaluation

Adib Sakhawat; Tahsin Islam; Takia Farhin; Syed Rifat Raiyan; Hasan Mahmud; and Md Kamrul Hasan

arXiv:2605.08522·cs.CL·May 15, 2026

Coordinates of Capability: A Unified MTMM-Geometric Framework for LLM Evaluation

Adib Sakhawat, Tahsin Islam, Takia Farhin, Syed Rifat Raiyan, Hasan Mahmud, and Md Kamrul Hasan

PDF

TL;DR

This paper introduces a geometric MTMM framework for LLM evaluation that unifies multiple metrics into a shared latent space, improving construct validity and robustness.

Contribution

It formalizes nine evaluation metrics as geometric measurements within a shared space, separating task-irrelevant factors from true capabilities.

Findings

01

Unifies nine metrics into a shared geometric framework

02

Identifies three orthogonal latent dimensions of model behavior

03

Provides a domain-agnostic taxonomy for robust benchmarking

Abstract

The evaluation of Large Language Models (LLMs) faces a critical challenge in construct validity, where fragmented benchmarks and ad hoc metrics frequently conflate method variance, such as prompt sensitivity, with true latent capabilities. Concurrently, emerging research suggests that LLM capabilities and outputs can be modeled as continuous geometric manifolds. In this Systematization of Knowledge (SoK), we bridge these paradigms by proposing a generalized Multi-Trait Multi-Method (MTMM) framework for LLM evaluation. We formalize and unify nine evaluation metrics, including Paraphrase Instability, Drift Score, Overton Width, and Pluralism Score, interpreting them not as isolated scalar values but as geometric measurements within a shared latent coordinate space. This spatial unification factorizes model behavior into three orthogonal latent dimensions: (1) Instability and Sensitivity,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.