Markovian ODE-guided scoring can assess the quality of offline reasoning traces in language models
Arghodeep Nandi, Ojasva Saxena, Tanmoy Chakraborty

TL;DR
MarODE is a novel offline evaluation framework that uses Markovian ODE modeling to assess reasoning trace quality in language models, outperforming existing methods and aligning with human judgments.
Contribution
Introducing MarODE, a theory-driven, ODE-based evaluation method that effectively measures reasoning trace quality in language models, addressing limitations of prior mechanical metrics.
Findings
MarODE outperforms baselines by over 250% in correlation with human judgments.
The framework effectively captures human-centric notions of reasoning quality.
Markovian ODE modeling enables efficient and generalizable evaluation of reasoning traces.
Abstract
Reasoning traces produced by generative language models are increasingly used for tasks ranging from mathematical problem solving to automated fact checking. However, existing evaluation methods remain largely mechanical and fail to capture human-centric notions of reasoning quality in a way that generalizes across varied and progressively degraded reasoning. We introduce MarODE, an offline evaluation framework that assigns quality scores to reasoning traces. Its effectiveness is assessed using human-centric perturbations and human judgments, which jointly evaluate the fundamental dimensions of an evaluation metric - goodness and soundness. The approach is grounded in a Markovian formulation of reasoning progression and an ordinary differential equation based characterization of trace dynamics, enabling efficient evaluation of reasoning quality. In a large-scale evaluation, MarODE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
