More Agents Improve Math Problem Solving but Adversarial Robustness Gap Persists
Khashayar Alavi, Zhastay Yeltay, Lucie Flek, Akbar Karimi

TL;DR
This study shows that while increasing the number of collaborating LLM agents improves mathematical problem-solving accuracy, their vulnerability to adversarial inputs, especially human-like typos, remains largely unmitigated.
Contribution
It introduces a unified framework to evaluate the robustness of multi-agent LLM systems against adversarial perturbations in math questions.
Findings
Collaboration improves accuracy with more agents, especially from 1 to 5 agents.
Adversarial robustness gap persists regardless of the number of agents.
Human-like typos cause the largest accuracy drops and attack success rates.
Abstract
When LLM agents work together, they seem to be more powerful than a single LLM in mathematical question answering. However, are they also more robust to adversarial inputs? We investigate this question using adversarially perturbed math questions. These perturbations include punctuation noise with three intensities (10%, 30%, 50%), plus real-world and human-like typos (WikiTypo, R2ATA). Using a unified sampling-and-voting framework (Agent Forest), we evaluate six open-source models (Qwen3-4B/14B, Llama3.1-8B, Mistral-7B, Gemma3-4B/12B) across four benchmarks (GSM8K, MATH, MMLU-Math, MultiArith), with various numbers of agents n = {1,2,5,10,15,20,25}. Our findings show that 1) Noise type matters: punctuation noise harm scales with its severity, and the human typos remain the dominant bottleneck, yielding the largest gaps to Clean accuracy and the highest attack success rate (ASR) even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
