Confidence over Time: Confidence Calibration with Temporal Logic for Large Language Model Reasoning

Zhenjiang Mao; Anirudhh Venkat; Artem Bisliouk; Akshat Kothiyal; Sindhura Kumbakonam Subramanian; Saithej Singhu; Ivan Ruchkin

arXiv:2601.13387·cs.CL·January 21, 2026

Confidence over Time: Confidence Calibration with Temporal Logic for Large Language Model Reasoning

Zhenjiang Mao, Anirudhh Venkat, Artem Bisliouk, Akshat Kothiyal, Sindhura Kumbakonam Subramanian, Saithej Singhu, Ivan Ruchkin

PDF

Open Access

TL;DR

This paper introduces a novel method for calibrating confidence in large language models by analyzing how confidence evolves over reasoning steps using Signal Temporal Logic, leading to more accurate confidence estimates.

Contribution

It proposes a stepwise confidence calibration method based on temporal logic, improving over scalar confidence scores in LLM reasoning tasks.

Findings

01

Confidence scores are more calibrated than baselines.

02

Temporal logic patterns generalize across tasks.

03

Hypernetwork-informed STL blocks adapt to individual questions.

Abstract

Large Language Models (LLMs) increasingly rely on long-form, multi-step reasoning to solve complex tasks such as mathematical problem solving and scientific question answering. Despite strong performance, existing confidence estimation methods typically reduce an entire reasoning process to a single scalar score, ignoring how confidence evolves throughout the generation. As a result, these methods are often sensitive to superficial factors such as response length or verbosity, and struggle to distinguish correct reasoning from confidently stated errors. We propose to characterize the stepwise confidence signal using Signal Temporal Logic (STL). Using a discriminative STL mining procedure, we discover temporal formulas that distinguish confidence signals of correct and incorrect responses. Our analysis found that the STL patterns generalize across tasks, and numeric parameters exhibit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning