MSCR: Exploring the Vulnerability of LLMs' Mathematical Reasoning Abilities Using Multi-Source Candidate Replacement
Zhishen Sun, Guang Dai, Haishan Ye

TL;DR
This paper introduces MSCR, an automated adversarial attack method that uses multi-source candidate replacement to test and reveal vulnerabilities in LLMs' mathematical reasoning abilities, showing significant accuracy drops with minimal input perturbations.
Contribution
The paper presents MSCR, a novel scalable and semantically consistent adversarial attack technique combining multiple information sources to challenge LLMs in mathematical reasoning tasks.
Findings
Minor input perturbations can reduce model accuracy by up to 49.89%.
Perturbations increase response length and computational resource use.
Current LLMs show significant robustness weaknesses in mathematical reasoning.
Abstract
LLMs demonstrate performance comparable to human abilities in complex tasks such as mathematical reasoning, but their robustness in mathematical reasoning under minor input perturbations still lacks systematic investigation. Existing methods generally suffer from limited scalability, weak semantic preservation, and high costs. Therefore, we propose MSCR, an automated adversarial attack method based on multi-source candidate replacement. By combining three information sources including cosine similarity in the embedding space of LLMs, the WordNet dictionary, and contextual predictions from a masked language model, we generate for each word in the input question a set of semantically similar candidates, which are then filtered and substituted one by one to carry out the attack. We conduct large-scale experiments on LLMs using the GSM8K and MATH500 benchmarks. The results show that even a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Polynomial and algebraic computation · History and Theory of Mathematics
