Markovian ODE-guided scoring can assess the quality of offline reasoning traces in language models

Arghodeep Nandi; Ojasva Saxena; Tanmoy Chakraborty

arXiv:2603.01580·cs.CL·March 3, 2026

Markovian ODE-guided scoring can assess the quality of offline reasoning traces in language models

Arghodeep Nandi, Ojasva Saxena, Tanmoy Chakraborty

PDF

Open Access

TL;DR

MarODE is a novel offline evaluation framework that uses Markovian ODE modeling to assess reasoning trace quality in language models, outperforming existing methods and aligning with human judgments.

Contribution

Introducing MarODE, a theory-driven, ODE-based evaluation method that effectively measures reasoning trace quality in language models, addressing limitations of prior mechanical metrics.

Findings

01

MarODE outperforms baselines by over 250% in correlation with human judgments.

02

The framework effectively captures human-centric notions of reasoning quality.

03

Markovian ODE modeling enables efficient and generalizable evaluation of reasoning traces.

Abstract

Reasoning traces produced by generative language models are increasingly used for tasks ranging from mathematical problem solving to automated fact checking. However, existing evaluation methods remain largely mechanical and fail to capture human-centric notions of reasoning quality in a way that generalizes across varied and progressively degraded reasoning. We introduce MarODE, an offline evaluation framework that assigns quality scores to reasoning traces. Its effectiveness is assessed using human-centric perturbations and human judgments, which jointly evaluate the fundamental dimensions of an evaluation metric - goodness and soundness. The approach is grounded in a Markovian formulation of reasoning progression and an ordinary differential equation based characterization of trace dynamics, enabling efficient evaluation of reasoning quality. In a large-scale evaluation, MarODE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification