ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving

Zain Ul Abedin; Shahzeb Qamar; Lucie Flek; Akbar Karimi

arXiv:2501.08203·cs.CL·March 17, 2026

ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving

Zain Ul Abedin, Shahzeb Qamar, Lucie Flek, Akbar Karimi

PDF

Open Access 1 Repo

TL;DR

This paper introduces ArithmAttack, a method to evaluate the robustness of large language models in math problem solving when faced with noisy prompts containing extra punctuation, revealing their vulnerability to such noise.

Contribution

The paper presents ArithmAttack, a simple yet effective approach to assess LLM robustness to noisy inputs without information loss, and evaluates multiple models on this metric.

Findings

01

All models are vulnerable to noisy prompts.

02

Performance degrades as noise increases.

03

Robustness varies across different LLMs.

Abstract

While Large Language Models (LLMs) have shown impressive capabilities in math problem-solving tasks, their robustness to noisy inputs is not well-studied. We propose ArithmAttack to examine how robust the LLMs are when they encounter noisy prompts that contain extra noise in the form of punctuation marks. While being easy to implement, ArithmAttack does not cause any information loss since words are not added or deleted from the context. We evaluate the robustness of eight LLMs, including LLama3, Mistral, Mathstral, and DeepSeek on noisy GSM8K and MultiArith datasets. Our experiments suggest that all the studied models show vulnerability to such noise, with more noise leading to poorer performances.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caisa-lab/arithmattack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning