How Low Can You Go? The Data-Light SE Challenge

Kishan Kumar Ganguly; Tim Menzies

arXiv:2512.13524·cs.SE·March 31, 2026

How Low Can You Go? The Data-Light SE Challenge

Kishan Kumar Ganguly, Tim Menzies

PDF

1 Repo

TL;DR

This paper challenges the assumption that extensive datasets and complex optimizers are necessary in Software Engineering, demonstrating that simple, lightweight methods often achieve near-optimal results with minimal data.

Contribution

It introduces the data-light challenge, formalizes labeling, proposes lightweight baselines, and provides empirical results showing when simple methods suffice in SE tasks.

Findings

01

Simple methods achieve over 90% of best results with few labels

02

Lightweight approaches perform as well as complex optimizers in many cases

03

Few samples are sufficient for rapid, cost-effective SE guidance

Abstract

Much of Software Engineering (SE) research assumes that progress depends on massive datasets and CPU-intensive optimizers. Yet has this assumption been rigorously tested? The counter-evidence presented in this paper suggests otherwise. For over 100 optimization tasks from recent SE papers (including software configuration, performance tuning, product line engineering, project health forecasting, defect prediction, software testing, software process and cost estimation, and cross-domain generalization datasets), even with just a few dozen labels, very simple methods (e.g., diversity sampling, a minimal Bayesian learner, its distance-based non-parametric variant, or random probes) achieve over 90% of the best reported results. Furthermore, these simple methods perform just as well as more complex state-of-the-the-art optimizers like SMAC, TPE, DEHB etc. While some tasks would require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KKGanguly/NEO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.