When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents
Strick Sheng, Ziyue Wang, Liyi Zhou

TL;DR
This paper introduces EnvTrustBench, a framework for evaluating whether large language model agents reliably ground their actions in current environmental evidence, addressing a critical reliability and security issue.
Contribution
It defines evidence-grounding defects and provides a systematic benchmarking approach to detect these failures in LLM agents across various scenarios.
Findings
EGDs are common across different LLM backbones and scaffolds
Environmental grounding failures can lead to incorrect agent actions
Benchmark reveals security implications of grounding defects
Abstract
Large language model agents increasingly operate through environment-facing scaffolds that expose files, web pages, APIs, and logs. These observations influence tool use, state tracking, and action sequencing, yet their reliability and authority are often uncertain. Environmental grounding is therefore a systems-level problem involving context admission, evidence provenance, freshness checking, verification policy, action gating, and model reasoning. Existing agent benchmarks mainly evaluate task capability or specific attacks such as prompt injection and memory poisoning, but they under-specify a fundamental reliability question: whether agents remain grounded in the true environment state when observations are stale, incorrect, or malicious. We introduce EnvTrustBench, an agentic framework for benchmarking this failure mode. We define an evidence-grounding defect (EGD) as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
