Error-Driven Prompt Optimization for Arithmetic Reasoning

\'Arp\'ad P\'andy; R\'obert Lakatos; Andr\'as Hajdu

arXiv:2512.13323·cs.AI·December 16, 2025

Error-Driven Prompt Optimization for Arithmetic Reasoning

\'Arp\'ad P\'andy, R\'obert Lakatos, Andr\'as Hajdu

PDF

Open Access

TL;DR

This paper presents an error-driven prompt optimization method that significantly improves the arithmetic reasoning accuracy of small language models, making them more reliable and suitable for industrial, privacy-sensitive applications.

Contribution

The paper introduces an iterative, error-driven prompt refinement framework that enhances small language models' arithmetic reasoning without fine-tuning, outperforming larger models in privacy-sensitive settings.

Findings

01

Model accuracy improved to 70.8% with the proposed method.

02

Error clustering and prompt refinement significantly boost arithmetic performance.

03

Small models can surpass larger models like GPT-3.5 Turbo in privacy-preserving scenarios.

Abstract

Recent advancements in artificial intelligence have sparked interest in industrial agents capable of supporting analysts in regulated sectors, such as finance and healthcare, within tabular data workflows. A key capability for such systems is performing accurate arithmetic operations on structured data while ensuring sensitive information never leaves secure, on-premises environments. Here, we introduce an error-driven optimization framework for arithmetic reasoning that enhances a Code Generation Agent (CGA), specifically applied to on-premises small language models (SLMs). Through a systematic evaluation of a leading SLM (Qwen3 4B), we find that while the base model exhibits fundamental limitations in arithmetic tasks, our proposed error-driven method, which clusters erroneous predictions to refine prompt-rules iteratively, dramatically improves performance, elevating the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Multimodal Machine Learning Applications · Topic Modeling