Towards a Benchmark for Dependency Decision-Making
Tanmay Singla, Berk \c{C}akar, Paschal C. Amusuo, James C. Davis

TL;DR
This paper introduces DepDec-Bench, a benchmark for evaluating dependency decision-making in AI coding agents, emphasizing security, efficiency, and policy compliance beyond mere functional correctness.
Contribution
It presents a new benchmark and evaluation framework based on real-world dependency change data, highlighting security and policy considerations often overlooked.
Findings
Agents often select vulnerable dependency versions.
Dependency decisions can have negative security impacts.
Benchmark evaluates safe, disciplined dependency management.
Abstract
AI coding agents increasingly modify real software repositories and make dependency decisions, including adding, removing, or updating third-party packages. These choices can materially affect security posture and maintenance burden, yet repository-level evaluations largely emphasize test passing and executability without explicitly scoring whether systems (i) reuse existing dependencies, (ii) avoid unnecessary additions, or (iii) select versions that satisfy security and policy constraints. We propose DepDec-Bench, a benchmark for evaluating dependency decision-making beyond functional correctness. To ground DepDec-Bench in real-world behavior, we conduct a preliminary study of 117,062 dependency changes from agent- and human-authored pull requests across seven ecosystems. We show that coding agents frequently make dependency decisions with security consequences that remain invisible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Information and Cyber Security · Advanced Malware Detection Techniques
