AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

Ruipeng Wang; Yuxin Chen; Yukai Wang; Chang Wu; Junfeng Fang; Xiaodong Cai; Qi Gu; Hui Su; An Zhang; Xiang Wang; Xunliang Cai; Tat-Seng Chua

arXiv:2602.11348·cs.AI·February 19, 2026

AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

Ruipeng Wang, Yuxin Chen, Yukai Wang, Chang Wu, Junfeng Fang, Xiaodong Cai, Qi Gu, Hui Su, An Zhang, Xiang Wang, Xunliang Cai, Tat-Seng Chua

PDF

Open Access

TL;DR

This paper introduces AgentNoiseBench, a framework for systematically evaluating the robustness of tool-using LLM agents under noisy real-world conditions, revealing their sensitivity to environmental perturbations.

Contribution

We develop a noise-injection pipeline for agent benchmarks and provide extensive evaluations showing how noise impacts LLM agent performance.

Findings

01

Models exhibit significant performance drops under noise.

02

Performance varies with different noise types and levels.

03

Current agents are sensitive to real-world environmental noise.

Abstract

Recent advances in large language models have enabled LLM-based agents to achieve strong performance on a variety of benchmarks. However, their performance in real-world deployments often that observed on benchmark settings, especially in complex and imperfect environments. This discrepancy largely arises because prevailing training and evaluation paradigms are typically built on idealized assumptions, overlooking the inherent stochasticity and noise present in real-world interactions. To bridge this gap, we introduce AgentNoiseBench, a framework for systematically evaluating the robustness of agentic models under noisy environments. We first conduct an in-depth analysis of biases and uncertainties in real-world scenarios and categorize environmental noise into two primary types: user-noise and tool-noise. Building on this analysis, we develop an automated pipeline that injects…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications