NeuroFlake: A Neuro-Symbolic LLM Framework for Flaky Test Classification

Khondaker Tasnia Hoque; Toukir Ahammed

arXiv:2605.11482·cs.SE·May 13, 2026

NeuroFlake: A Neuro-Symbolic LLM Framework for Flaky Test Classification

Khondaker Tasnia Hoque, Toukir Ahammed

PDF

TL;DR

NeuroFlake is a neuro-symbolic framework that improves flaky test classification by integrating high-fidelity code tokens into LLMs, achieving better accuracy and robustness on imbalanced real-world datasets.

Contribution

It introduces Discriminative Token Mining to enhance LLM attention with symbolic signals, improving flaky test classification performance and robustness.

Findings

01

F1-score improved to 69.34% from 65.79%

02

NeuroFlake maintains stability with only 4-7 pp performance drop under adversarial augmentations

03

Baseline models degrade by 8-18 pp on perturbed tests.

Abstract

Flaky tests, which exhibit non-deterministic pass/fail behavior for the same version of code, pose significant challenges to reliable regression testing. While large language models (LLMs) promise for automated flaky test classification, they often fail to comprehend the actual logic behind test flakiness, instead overfitting to superficial textual artifacts (e.g., specific variable names). This semantic fragility leads to poor generalization on real-world imbalance dataset and vulnerability to perturbations. In this paper, we introduce NeuroFlake, a novel neuro-Symbolic framework for classifying flaky tests on highly imbalanced, real-world datasets (FlakeBench). Unlike prior approaches that rely on brittle manual rule and black box learning, NeuroFlake integrates a Discriminative Token Mining (DTM) module to automate the discovery of high-fidelity, statistically significant source code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.