LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

Darren Zhu; Daren Ler

arXiv:2605.09518·cs.LG·May 12, 2026

LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

Darren Zhu, Daren Ler

PDF

TL;DR

This paper proposes augmenting meta-datasets with synthetic data generated by large language models to improve algorithm selection in meta-learning, demonstrating significant performance gains across multiple metrics.

Contribution

It introduces a novel LLM-driven data augmentation method for meta-learning, focusing on performance space coverage to enhance algorithm selection accuracy.

Findings

01

Uniform augmentation outperforms margin-based sampling in improving meta-learner performance.

02

Synthetic datasets generated via LLMs significantly enhance regression and multi-label evaluation results.

03

Augmentation reduces Hamming loss and improves subset accuracy and R^2 scores.

Abstract

Meta-learning for algorithm selection relies on a meta-dataset in which each row corresponds to a supervised learning dataset described by meta-features and labelled with a target value that is associated with algorithm choice (typically, some function of algorithm performance). A persistent limitation is that the number of curated real-world datasets is small, resulting in sparse meta-datasets that constrain meta-learner generalisation. In this paper, we address this problem by augmenting the meta-dataset with synthetic regression datasets produced via a large language model (LLM), with generation steered toward target regions of a low-dimensionality performance space. In our experiments, we adopt a two-dimensional geometric setting defined by the cross-validated $R^{2}$ scores of two anchor algorithms, known as landmarkers. We compare two augmentation strategies: (1) uniform sampling,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.