When Developer Aid Becomes Security Debt: A Systematic Analysis of Insecure Behaviors in LLM Coding Agents
Matous Kozak, Roshanak Zilouchian Moghaddam, Siva Sivaraman

TL;DR
This paper systematically evaluates the security risks of LLM-based coding agents, revealing significant unsafe behaviors and vulnerabilities, and assesses mitigation strategies across multiple models to improve safety in software development.
Contribution
First comprehensive safety analysis of autonomous coding agents, identifying prevalent security vulnerabilities and evaluating mitigation techniques across state-of-the-art models.
Findings
21% of agent actions were insecure
Information exposure (CWE-200) was the most common vulnerability
GPT-4.1 achieved 96.8% success in mitigation strategies
Abstract
LLM-based coding agents are rapidly being deployed in software development, yet their safety implications remain poorly understood. These agents, while capable of accelerating software development, may exhibit unsafe behaviors during normal operation that manifest as cybersecurity vulnerabilities. We conducted the first systematic safety evaluation of autonomous coding agents, analyzing over 12,000 actions across five state-of-the-art models (GPT-4o, GPT-4.1, Claude variants) on 93 real-world software setup tasks. Our findings reveal significant security concerns: 21% of agent trajectories contained insecure actions, with models showing substantial variation in unsafe behavior. We developed a high-precision detection system that identified four major vulnerability categories, with information exposure (CWE-200) being the most prevalent one. We also evaluated mitigation strategies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Advanced Data Storage Technologies · FinTech, Crowdfunding, Digital Finance
