LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection
Darren Zhu, Daren Ler

TL;DR
This paper proposes augmenting meta-datasets with synthetic data generated by large language models to improve algorithm selection in meta-learning, demonstrating significant performance gains across multiple metrics.
Contribution
It introduces a novel LLM-driven data augmentation method for meta-learning, focusing on performance space coverage to enhance algorithm selection accuracy.
Findings
Uniform augmentation outperforms margin-based sampling in improving meta-learner performance.
Synthetic datasets generated via LLMs significantly enhance regression and multi-label evaluation results.
Augmentation reduces Hamming loss and improves subset accuracy and R^2 scores.
Abstract
Meta-learning for algorithm selection relies on a meta-dataset in which each row corresponds to a supervised learning dataset described by meta-features and labelled with a target value that is associated with algorithm choice (typically, some function of algorithm performance). A persistent limitation is that the number of curated real-world datasets is small, resulting in sparse meta-datasets that constrain meta-learner generalisation. In this paper, we address this problem by augmenting the meta-dataset with synthetic regression datasets produced via a large language model (LLM), with generation steered toward target regions of a low-dimensionality performance space. In our experiments, we adopt a two-dimensional geometric setting defined by the cross-validated scores of two anchor algorithms, known as landmarkers. We compare two augmentation strategies: (1) uniform sampling,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
