An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification
Riddhi More, Jeremy S. Bradbury

TL;DR
This paper compares fine-tuning large language models and few-shot learning approaches for detecting and classifying flaky tests, highlighting their respective advantages and performance in resource-constrained scenarios.
Contribution
It introduces FlakyXbert, a few-shot learning method using Siamese networks, and evaluates its performance against fine-tuning on flaky test datasets.
Findings
Fine-tuning achieves higher accuracy with more data.
Few-shot learning offers a cost-effective alternative.
Both methods are viable depending on resource availability.
Abstract
Flaky tests exhibit non-deterministic behavior during execution and they may pass or fail without any changes to the program under test. Detecting and classifying these flaky tests is crucial for maintaining the robustness of automated test suites and ensuring the overall reliability and confidence in the testing. However, flaky test detection and classification is challenging due to the variability in test behavior, which can depend on environmental conditions and subtle code interactions. Large Language Models (LLMs) offer promising approaches to address this challenge, with fine-tuning and few-shot learning (FSL) emerging as viable techniques. With enough data fine-tuning a pre-trained LLM can achieve high accuracy, making it suitable for organizations with more resources. Alternatively, we introduce FlakyXbert, an FSL approach that employs a Siamese network architecture to train…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetallurgy and Material Forming
MethodsSiamese Network
