LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries

Zekun Wu; Seonglae Cho; Umar Mohammed; Cristian Munoz; Kleyton Costa; Xin Guan; Theo King; Ze Wang; Emre Kazim; Adriano Koshiyama

arXiv:2505.08842·cs.CR·July 1, 2025

LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries

Zekun Wu, Seonglae Cho, Umar Mohammed, Cristian Munoz, Kleyton Costa, Xin Guan, Theo King, Ze Wang, Emre Kazim, Adriano Koshiyama

PDF

Open Access 1 Video

TL;DR

LibVulnWatch is a system that uses advanced language models and agent workflows to assess security, licensing, and compliance risks in open-source AI libraries, providing a public leaderboard for ongoing monitoring.

Contribution

It introduces a novel, scalable framework combining language models and agent orchestration for comprehensive risk assessment of open-source AI libraries.

Findings

01

Covered up to 88% of OpenSSF Scorecard checks

02

Identified up to 19 additional risks per library

03

Applied to 20 widely used AI libraries

Abstract

Open-source AI libraries are foundational to modern AI systems, yet they present significant, underexamined risks spanning security, licensing, maintenance, supply chain integrity, and regulatory compliance. We introduce LibVulnWatch, a system that leverages recent advances in large language models and agentic workflows to perform deep, evidence-based evaluations of these libraries. Built on a graph-based orchestration of specialized agents, the framework extracts, verifies, and quantifies risk using information from repositories, documentation, and vulnerability databases. LibVulnWatch produces reproducible, governance-aligned scores across five critical domains, publishing results to a public leaderboard for ongoing ecosystem monitoring. Applied to 20 widely used libraries, including ML frameworks, LLM inference engines, and agent orchestration tools, our approach covers up to 88% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LibVulnWatch: A Deep Assessment Agent System and Leaderboard for Uncovering Hidden Vulnerabilities in Open-Source AI Libraries· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Scientific Computing and Data Management · Advanced Malware Detection Techniques

MethodsLib