Learning to Triage Taint Flows Reported by Dynamic Program Analysis in Node.js Packages
Ronghao Ni, Aidan Z.H. Yang, Min-Chien Hsu, Nuno Sabino, Limin Jia, Ruben Martins, Darion Cassel, Kevin Cheang

TL;DR
This paper explores machine learning techniques, including LLMs and GNNs, to prioritize vulnerability reports from dynamic analysis in Node.js packages, significantly reducing manual review effort and improving detection accuracy.
Contribution
It introduces a benchmark dataset of Node.js vulnerabilities and evaluates various ML models, demonstrating the effectiveness of LLMs and GNNs in vulnerability triage.
Findings
Top LLM achieves F1=0.915 in vulnerability classification.
Best GNN and classical ML models reach F1=0.904.
Approach can detect 99.2% of exploitable taint flows at 80% precision.
Abstract
Program analysis tools often produce large volumes of candidate vulnerability reports that require costly manual review, creating a practical challenge: how can security analysts prioritize the reports most likely to be true vulnerabilities? This paper investigates whether machine learning can be applied to prioritizing vulnerabilities reported by program analysis tools. We focus on Node.js packages and collect a benchmark of 1,883 Node.js packages, each containing one reported ACE or ACI vulnerability. We evaluate a variety of machine learning approaches, including classical models, graph neural networks (GNNs), large language models (LLMs), and hybrid models that combine GNN and LLMs, trained on data based on a dynamic program analysis tool's output. The top LLM achieves , while the best GNN and classical ML models reaching . At a less than 7%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Security and Verification in Computing · Advanced Malware Detection Techniques
