Cutting Through the Noise: Boosting LLM Performance on Math Word Problems

Ujjwala Anantheswaran; Himanshu Gupta; Kevin Scaria; Shreyas Verma; Chitta Baral; Swaroop Mishra

arXiv:2406.15444·cs.CL·September 17, 2025

Cutting Through the Noise: Boosting LLM Performance on Math Word Problems

Ujjwala Anantheswaran, Himanshu Gupta, Kevin Scaria, Shreyas Verma, Chitta Baral, Swaroop Mishra

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a prompting framework and adversarial training dataset to improve large language models' robustness in solving math word problems with irrelevant information, demonstrating enhanced performance and generalizability.

Contribution

It presents a novel adversarial dataset and a prompting framework to boost LLM robustness against irrelevant information in math word problems.

Findings

01

LLMs' performance drops by ~26% on adversarial MWPs without mitigation.

02

Fine-tuning on adversarial samples improves LLM performance by ~8%.

03

LLMs' performance on adversarial GSM-8K-Adv decreases by up to 6%.

Abstract

Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables. We introduce a dataset, PROBLEMATHIC, containing both adversarial and non-adversarial MWPs. Our experiments reveal that LLMs are susceptible to distraction by numerical noise, resulting in an average relative performance drop of ~26% on adversarial MWPs. To mitigate this, we fine-tune LLMs (Llama-2, Mistral) on the adversarial samples from our dataset. Fine-tuning on adversarial training instances improves performance on adversarial MWPs by ~8%, indicating increased robustness to noise and improved ability to identify relevant data for reasoning. Finally, to assess the generalizability of our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

him1411/problemathic
noneOfficial

Datasets

him1411/problemathic
dataset· 119 dl
119 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing