MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations
Akash Dutta, Ali Jannesari

TL;DR
MIREncoder is a multi-modal IR-based auto-encoder that pre-trains embeddings capturing code semantics and structure, enabling transfer learning for performance optimization with reduced overhead in high-performance computing.
Contribution
The paper introduces MIREncoder, a novel multi-modal IR-based auto-encoder that enables effective pre-training of embeddings for code analysis and optimization tasks, reducing overhead and improving performance.
Findings
Outperforms state-of-the-art methods in code optimization tasks.
Reduces overhead compared to large language model approaches.
Effectively captures code semantics and structure for downstream use.
Abstract
One of the primary areas of interest in High Performance Computing is the improvement of performance of parallel workloads. Nowadays, compilable source code-based optimization tasks that employ deep learning often exploit LLVM Intermediate Representations (IRs) for extracting features from source code. Most such works target specific tasks, or are designed with a pre-defined set of heuristics. So far, pre-trained models are rare in this domain, but the possibilities have been widely discussed. Especially approaches mimicking large-language models (LLMs) have been proposed. But these have prohibitively large training costs. In this paper, we propose MIREncoder, a M}ulti-modal IR-based Auto-Encoder that can be pre-trained to generate a learned embedding space to be used for downstream tasks by machine learning-based approaches. A multi-modal approach enables us to better extract features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management
MethodsSparse Evolutionary Training
