Data-efficient Performance Modeling via Pre-training
Chunting Liu, Riyadh Baghdadi

TL;DR
This paper presents a self-supervised pre-training approach using autoencoders to significantly reduce labeled data requirements for performance modeling in code optimization, achieving comparable accuracy with less data.
Contribution
Introducing a pre-training scheme with autoencoders that enhances performance model accuracy while drastically reducing the need for extensive labeled datasets.
Findings
Achieves similar performance with 5x less data.
Reduces data collection time and cost.
Improves model accuracy in code performance prediction.
Abstract
Performance models are essential for automatic code optimization, enabling compilers to predict the effects of code transformations on performance and guide search for optimal transformations. Building state-of-the-art performance models with deep learning, however, requires vast labeled datasets of random programs -- an expensive and time-consuming process, stretching over months. This paper introduces a self-supervised pre-training scheme with autoencoders to reduce the need for labeled data. By pre-training on a large dataset of random programs, the autoencoder learns representations of code and transformations, which are then used to embed programs for the performance model. Implemented in the Tiramisu autoscheduler, our approach improves model accuracy with less data. For example, to achieve a MAPE of 20.72%, the original model requires 18 million data points, whereas our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability
