TL;DR
IR2Vec introduces a scalable, IR-based program embedding method that captures syntax and semantics, enabling faster training and improved performance in optimization tasks across multiple platforms.
Contribution
The paper presents IR2Vec, a novel IR-based embedding infrastructure with symbolic and flow-aware encodings, outperforming existing methods in speed and accuracy.
Findings
Outperforms existing methods in device mapping and thread coarsening tasks.
Enables faster training with non-sequential models.
Achieves state-of-the-art or improved results in benchmark suites.
Abstract
We propose IR2Vec, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. This distributed embedding is obtained by combining representation learning methods with flow information to capture the syntax as well as the semantics of the input programs. As our infrastructure is based on the Intermediate Representation (IR) of the source code, obtained embeddings are both language and machine independent. The entities of the IR are modeled as relationships, and their representations are learned to form a seed embedding vocabulary. Using this infrastructure, we propose two incremental encodings:Symbolic and Flow-Aware. Symbolic encodings are obtained from the seed embedding vocabulary, and Flow-Aware encodings are obtained by augmenting the Symbolic encodings with the flow information. We show the effectiveness of our methodology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
