Automatic Identification of Parallelizable Loops Using Transformer-Based Source Code Representations
Izavan dos S. Correia, Henrique C. T. Santos, and Tiago A. E. Ferreira

TL;DR
This paper presents a Transformer-based method using DistilBERT to accurately identify parallelizable loops in source code, outperforming traditional static analysis techniques.
Contribution
It introduces a novel Transformer-based approach that simplifies preprocessing and improves generalization for loop parallelization classification.
Findings
Achieved over 99% accuracy in classifying parallelizable loops.
Outperformed prior token-based methods in generalization and efficiency.
Demonstrated robustness on real-world and synthetic code datasets.
Abstract
Automatic parallelization remains a challenging problem in software engineering, particularly in identifying code regions where loops can be safely executed in parallel on modern multi-core architectures. Traditional static analysis techniques, such as dependence analysis and polyhedral models, often struggle with irregular or dynamically structured code. In this work, we propose a Transformer-based approach to classify the parallelization potential of source code, focusing on distinguishing independent (parallelizable) loops from undefined ones. We adopt DistilBERT to process source code sequences using subword tokenization, enabling the model to capture contextual syntactic and semantic patterns without handcrafted features. The approach is evaluated on a balanced dataset combining synthetically generated loops and manually annotated real-world code, using 10-fold cross-validation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
