From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines

Kishan Kumar Ganguly; Tim Menzies

arXiv:2512.18102·cs.SE·December 23, 2025

From Coverage to Causes: Data-Centric Fuzzing for JavaScript Engines

Kishan Kumar Ganguly, Tim Menzies

PDF

Open Access

TL;DR

This paper introduces a data-centric, machine learning-based fuzzing approach for JavaScript engines that predicts high-risk inputs using static and dynamic features, improving efficiency and effectiveness over traditional coverage-guided methods.

Contribution

It presents a novel feature-guided fuzzing method that leverages historical vulnerabilities and machine learning to identify high-risk inputs with minimal instrumentation, replacing coverage-based heuristics.

Findings

01

Achieved over 85% precision in predicting high-risk inputs.

02

Only 25% of selected features are needed for comparable performance.

03

Most of the search space is irrelevant for vulnerability detection.

Abstract

Context: Exhaustive fuzzing of modern JavaScript engines is infeasible due to the vast number of program states and execution paths. Coverage-guided fuzzers waste effort on low-risk inputs, often ignoring vulnerability-triggering ones that do not increase coverage. Existing heuristics proposed to mitigate this require expert effort, are brittle, and hard to adapt. Objective: We propose a data-centric, LLM-boosted alternative that learns from historical vulnerabilities to automatically identify minimal static (code) and dynamic (runtime) features for detecting high-risk inputs. Method: Guided by historical V8 bugs, iterative prompting generated 115 static and 49 dynamic features, with the latter requiring only five trace flags, minimizing instrumentation cost. After feature selection, 41 features remained to train an XGBoost model to predict high-risk inputs during fuzzing.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Web Application Security Vulnerabilities