Search-Induced Issues in Web-Augmented LLM Code Generation: Detecting and Repairing Error-Inducing Pages
Guoqing Wang, Zeyu Sun, Xiaofei Xie, Yizhou Chen, Yanchao Tan, Yifan Zhao, Dan Hao

TL;DR
This paper studies the vulnerability of web-augmented LLMs to Search-Induced Issues (SII), proposing Sherlock, an automated framework to detect, debug, and repair such issues to improve code generation reliability.
Contribution
It introduces Sherlock, a novel automated system that detects, debugs, and repairs Search-Induced Issues in web-augmented LLM code generation at scale.
Findings
Sherlock detects EIPs with up to 95% F1 score.
Sherlock repairs 71% to 100% of affected generations.
All evaluated web-augmented LLMs are vulnerable to SII.
Abstract
Web-augmented large language models (LLMs) offer promising capabilities for automatic code generation. However, integrating live web search exposes models to unreliable or malicious content, leading to Search-Induced Issues (SII), a novel failure mode in which external pages mislead LLMs into producing incorrect code. This paper presents a comprehensive empirical study of the prevalence and impact of SII across three commercial search APIs and six advanced LLMs. Our analysis reveals that all evaluated web-augmented LLMs are vulnerable to SII, with root causes arising from either misaligned specifications or flawed code implementations in the searched Error-Inducing Pages (EIPs). To address this challenge, we propose Sherlock, an automated framework that enables LLM service providers to proactively safeguard web-augmented generation systems at scale. Sherlock operates as a continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
