Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

Mousa Salah; Amgad Muneer

arXiv:2604.08563·cs.CL·April 13, 2026

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

Mousa Salah, Amgad Muneer

PDF

TL;DR

This study systematically evaluates how temperature settings affect the performance of different prompting strategies in extended reasoning large language models, revealing optimal configurations vary with strategy and temperature.

Contribution

It provides the first comprehensive analysis of temperature effects on prompting strategies in extended reasoning LLMs, highlighting the importance of joint optimization.

Findings

01

Zero-shot prompting peaks at moderate temperatures (T=0.4, 0.7) with 59% accuracy.

02

Chain-of-thought prompting performs best at temperature extremes.

03

Extended reasoning benefits increase from 6x to 14.3x as temperature rises.

Abstract

Extended reasoning models represent a transformative shift in Large Language Model (LLM) capabilities by enabling explicit test-time computation for complex problem solving. However, the optimal configuration of sampling temperature and prompting strategy for these systems remains largely underexplored. We systematically evaluate chain-of-thought and zero-shot prompting across four temperature settings (0.0, 0.4, 0.7, and 1.0) using Grok-4.1 with extended reasoning on 39 mathematical problems from AMO-Bench, a challenging International Mathematical Olympiad-level benchmark. We find that zero-shot prompting achieves peak performance at moderate temperatures, reaching 59% accuracy at T=0.4 and T=0.7, while chain-of-thought prompting performs best at the temperature extremes. Most notably, the benefit of extended reasoning increases from 6x at T=0.0 to 14.3x at T=1.0. These results suggest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.