Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning

Leo Lu; Jonathan Zhang; Sean Chua; Spencer Kim; Kevin Zhu; Sean O'Brien; Vasu Sharma

arXiv:2512.20647·cs.AI·December 25, 2025

Reasoning Relay: Evaluating Stability and Interchangeability of Large Language Models in Mathematical Reasoning

Leo Lu, Jonathan Zhang, Sean Chua, Spencer Kim, Kevin Zhu, Sean O'Brien, Vasu Sharma

PDF

Open Access

TL;DR

This paper investigates whether reasoning chains in large language models can be reliably transferred between different models, assessing the stability and interchangeability of their reasoning processes for improved modular AI reasoning.

Contribution

It introduces a framework for evaluating the interchangeability of reasoning across models using truncation and continuation experiments, highlighting the potential for modular reasoning in AI systems.

Findings

01

Hybrid reasoning chains can preserve or improve accuracy.

02

Interchangeability of reasoning is an emerging property.

03

Framework enables reproducible assessment of reasoning stability.

Abstract

Chain-of-Thought (CoT) prompting has significantly advanced the reasoning capabilities of large language models (LLMs). While prior work focuses on improving model performance through internal reasoning strategies, little is known about the interchangeability of reasoning across different models. In this work, we explore whether a partially completed reasoning chain from one model can be reliably continued by another model, either within the same model family or across families. We achieve this by assessing the sufficiency of intermediate reasoning traces as transferable scaffolds for logical coherence and final answer accuracy. We interpret this interchangeability as a means of examining inference-time trustworthiness, probing whether reasoning remains both coherent and reliable under model substitution. Using token-level log-probability thresholds to truncate reasoning at early, mid,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications