Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding
Pengfei He, Shaowei Wang, Tse-Hsun Chen, Muhammad Asaduzzaman

TL;DR
Sliceformer leverages dataflow-aware pretraining and constrained decoding with small language models to improve the accuracy and reliability of static program slicing, outperforming existing methods.
Contribution
The paper introduces Sliceformer, a novel approach that reformulates static program slicing as a sequence-to-sequence task with dataflow-aware training and constrained decoding.
Findings
Up to 22% improvement in ExactMatch over baselines.
Effective modeling of data dependencies through dataflow-aware pretraining.
Reduced hallucination in generated slices via constrained decoding.
Abstract
Static program slicing is a fundamental software engineering technique for isolating code relevant to specific variables. While recent learning-based approaches using language models (LMs) show promise in automating slice prediction, they suffer from inaccurate dependency modeling and unconstrained generation, where LMs fail to capture precise data flow relations and produce slices containing hallucinated tokens and statements. To address these challenges, we propose Sliceformer, a novel approach that reformulates static program slicing as a sequence-to-sequence task using small language models such as CodeT5+. Sliceformer introduces two key innovations that directly target the identified limitations. First, to improve dependency modeling, we design dataflow-aware pretraining objectives that leverage data flow graphs (DFG) to teach models data dependencies through dataflow-preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
