Error-Driven Prompt Optimization for Arithmetic Reasoning
\'Arp\'ad P\'andy, R\'obert Lakatos, Andr\'as Hajdu

TL;DR
This paper presents an error-driven prompt optimization method that significantly improves the arithmetic reasoning accuracy of small language models, making them more reliable and suitable for industrial, privacy-sensitive applications.
Contribution
The paper introduces an iterative, error-driven prompt refinement framework that enhances small language models' arithmetic reasoning without fine-tuning, outperforming larger models in privacy-sensitive settings.
Findings
Model accuracy improved to 70.8% with the proposed method.
Error clustering and prompt refinement significantly boost arithmetic performance.
Small models can surpass larger models like GPT-3.5 Turbo in privacy-preserving scenarios.
Abstract
Recent advancements in artificial intelligence have sparked interest in industrial agents capable of supporting analysts in regulated sectors, such as finance and healthcare, within tabular data workflows. A key capability for such systems is performing accurate arithmetic operations on structured data while ensuring sensitive information never leaves secure, on-premises environments. Here, we introduce an error-driven optimization framework for arithmetic reasoning that enhances a Code Generation Agent (CGA), specifically applied to on-premises small language models (SLMs). Through a systematic evaluation of a leading SLM (Qwen3 4B), we find that while the base model exhibits fundamental limitations in arithmetic tasks, our proposed error-driven method, which clusters erroneous predictions to refine prompt-rules iteratively, dramatically improves performance, elevating the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Multimodal Machine Learning Applications · Topic Modeling
