Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length

Zhiyu Xu; Jia Liu; Yixin Wang; Yuqi Gu

arXiv:2512.07019·stat.ME·December 12, 2025

Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length

Zhiyu Xu, Jia Liu, Yixin Wang, Yuqi Gu

PDF

Open Access

TL;DR

This paper introduces Latency-Response Theory (LaRT), a novel evaluation framework for large language models that jointly models response accuracy and chain-of-thought length, providing more precise assessments than existing methods.

Contribution

LaRT extends Item Response Theory by incorporating response latency, modeling the correlation between reasoning ability and speed, and demonstrating improved estimation accuracy and evaluation metrics.

Findings

01

LaRT outperforms IRT in estimation accuracy and confidence interval precision.

02

A strong negative correlation between ability and speed is observed across benchmarks.

03

LaRT provides different and more reliable LLM rankings than IRT.

Abstract

The proliferation of Large Language Models (LLMs) necessitates valid evaluation methods to guide downstream applications and actionable future improvements. The Item Response Theory (IRT) has recently emerged as a promising framework for evaluating LLMs via their response accuracy. Beyond simple response accuracy, LLMs' chain of thought (CoT) lengths serve as a vital indicator of their reasoning ability. To leverage the CoT length information to assist the evaluation of LLMs, we propose Latency-Response Theory (LaRT) to jointly model the response accuracy and CoT length by introducing the latent ability, latent speed, and a key correlation parameter between them. We derive an efficient estimation algorithm and establish rigorous identifiability results for the population parameters to ensure the statistical validity of estimation. Theoretical asymptotic analyses and simulation studies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Psychometric Methodologies and Testing