SmartOracle -- An Agentic Approach to Mitigate Noise in Differential Oracles

Srinath Srinivasan; Tim Menzies; Marcelo D'Amorim

arXiv:2601.15074·cs.SE·January 22, 2026

SmartOracle -- An Agentic Approach to Mitigate Noise in Differential Oracles

Srinath Srinivasan, Tim Menzies, Marcelo D'Amorim

PDF

Open Access

TL;DR

SmartOracle employs specialized LLM-based agents to automate and improve the accuracy of differential oracle validation in JavaScript fuzzing, reducing manual effort and false positives.

Contribution

It introduces an agentic LLM-based architecture for automating oracle validation in differential fuzzing, enhancing accuracy and efficiency.

Findings

01

Achieves 0.84 recall with 18% false positive rate on benchmarks.

02

Reduces analysis time by 4× and API costs by 10× compared to baseline.

03

Successfully identified unknown bugs in major JavaScript engines.

Abstract

Differential fuzzers detect bugs by executing identical inputs across distinct implementations of the same specification, such as JavaScript interpreters. Validating the outputs requires an oracle and for differential testing of JavaScript, these are constructed manually, making them expensive, time-consuming, and prone to false positives. Worse, when the specification evolves, this manual effort must be repeated. Inspired by the success of agentic systems in other SE domains, this paper introduces SmartOracle. SmartOracle decomposes the manual triage workflow into specialized Large Language Model (LLM) sub-agents. These agents synthesize independently gathered evidence from terminal runs and targeted specification queries to reach a final verdict. For historical benchmarks, SmartOracle achieves 0.84 recall with an 18% false positive rate. Compared to a sequential Gemini 2.5 Pro…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Logic, programming, and type systems