SemRep: Generative Code Representation Learning with Code Transformations
Weichen Li, Jiamin Song, Bogdan Alexandru Stoica, Arav Dhoot, Gabriel Ryan, Shengyu Fu, Kexin Pei

TL;DR
SemRep introduces a generative code representation learning framework utilizing semantics-preserving transformations, significantly enhancing code transformation tasks like optimization and editing with better accuracy, efficiency, and robustness.
Contribution
It proposes using semantics-preserving transformations as an intermediate representation for generative learning, improving code transformation performance over existing methods.
Findings
Outperforms baselines by 6.9% in correctness
Achieves 1.1x performance improvement
Enhances generalization and robustness
Abstract
Code transformation is a foundational capability in the software development process, where its effectiveness relies on constructing a high-quality code representation to characterize the input code semantics and guide the transformation. Existing approaches treat code transformation as an end-to-end learning task, leaving the construction of the representation needed for semantic reasoning implicit in model weights or relying on rigid compiler-level abstractions. We present SemRep, a framework that improves code transformation through generative code representation learning. Our key insight is to employ the semantics-preserving transformations as the intermediate representation, which serves as both a generative mid-training task and the guidance for subsequent instruction-specific code transformations. Across general code editing and optimization tasks (e.g., GPU kernel optimization),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
