Meta-Learning and Synthetic Data for Automated Pretraining and Finetuning

Fabio Ferreira

arXiv:2506.12161·cs.LG·June 17, 2025

Meta-Learning and Synthetic Data for Automated Pretraining and Finetuning

Fabio Ferreira

PDF

Open Access

TL;DR

This paper introduces meta-learning techniques to automate deep learning pipeline selection, hyperparameter tuning, and synthetic data generation, improving performance across vision and language tasks while reducing manual effort.

Contribution

It extends automated machine learning to deep learning by meta-learning pipeline ranking, data augmentation, and synthetic data generation for vision and language domains.

Findings

01

Meta-learning improves pipeline selection accuracy.

02

Synthetic data enhances downstream task performance.

03

Data augmentation is crucial in Self-Supervised Learning.

Abstract

The growing number of pretrained models in Machine Learning (ML) presents significant challenges for practitioners. Given a new dataset, they need to determine the most suitable deep learning (DL) pipeline, consisting of the pretrained model and the hyperparameters for finetuning to it. Moreover, as models grow in scale, the increasing reliance on real-world data poses a bottleneck for training and requires leveraging data more effectively. Addressing the first challenge often involves manual model selection and hyperparameter tuning. At the same time, as models grow larger and more and more of the available human-generated data is being used for training, data augmentation and synthetic data become critical elements. Automated machine learning offers a path to address these challenges but is traditionally designed for tabular data and classical ML methods. This dissertation adopts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Machine Learning and Data Classification · Hydraulic and Pneumatic Systems