Finding the Sweet Spot: Trading Quality, Cost, and Speed During Inference-Time LLM Reflection

Jack Butler; Nikita Kozodoi; Zainab Afolabi; Brian Tyacke; Gaiar Baimuratov

arXiv:2510.20653·stat.ML·October 24, 2025

Finding the Sweet Spot: Trading Quality, Cost, and Speed During Inference-Time LLM Reflection

Jack Butler, Nikita Kozodoi, Zainab Afolabi, Brian Tyacke, Gaiar Baimuratov

PDF

Open Access

TL;DR

This paper systematically compares self-reflection and budget tuning techniques for LLM inference, analyzing their trade-offs in accuracy, cost, and speed across domains, and provides practical guidance for optimal strategy selection.

Contribution

It offers a comprehensive evaluation of self-reflection and budget tuning across multiple LLMs and domains, highlighting domain-dependent effectiveness and providing actionable insights.

Findings

01

Self-reflection can improve performance up to 220% in mathematical reasoning.

02

Performance gains vary significantly across domains and models.

03

Deployment in real-world systems confirms domain-specific effectiveness.

Abstract

As Large Language Models (LLMs) continue to evolve, practitioners face increasing options for enhancing inference-time performance without model retraining, including budget tuning and multi-step techniques like self-reflection. While these methods improve output quality, they create complex trade-offs among accuracy, cost, and latency that remain poorly understood across different domains. This paper systematically compares self-reflection and budget tuning across mathematical reasoning and translation tasks. We evaluate prominent LLMs, including Anthropic Claude, Amazon Nova, and Mistral families, along with other models under varying reflection depths and compute budgets to derive Pareto optimal performance frontiers. Our analysis reveals substantial domain dependent variation in self-reflection effectiveness, with performance gains up to 220\% in mathematical reasoning. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Machine Learning in Materials Science