Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic

Zhenjiang Mao; Artem Bisliouk; Rohith Reddy Nama; Ivan Ruchkin

arXiv:2506.08243·cs.LG·June 11, 2025

Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic

Zhenjiang Mao, Artem Bisliouk, Rohith Reddy Nama, Ivan Ruchkin

PDF

Open Access

TL;DR

This paper introduces a novel framework that models and evaluates the confidence of chain-of-thought reasoning in large language models using Signal Temporal Logic, improving the reliability of confidence estimates.

Contribution

It proposes a formal STL-based method to evaluate and reshape confidence signals in LLM reasoning, enhancing interpretability and calibration.

Findings

01

Improves confidence calibration metrics

02

Provides more reliable uncertainty estimates

03

Enforces smoothness and causal consistency

Abstract

Large Language Models (LLMs) have shown impressive performance in mathematical reasoning tasks when guided by Chain-of-Thought (CoT) prompting. However, they tend to produce highly confident yet incorrect outputs, which poses significant risks in domains like education, where users may lack the expertise to assess reasoning steps. To address this, we propose a structured framework that models stepwise confidence as a temporal signal and evaluates it using Signal Temporal Logic (STL). In particular, we define formal STL-based constraints to capture desirable temporal properties and compute robustness scores that serve as structured, interpretable confidence estimates. Our approach also introduces a set of uncertainty reshaping strategies to enforce smoothness, monotonicity, and causal consistency across the reasoning trajectory. Experiments show that our approach consistently improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning

MethodsSparse Evolutionary Training