Code Compliance Assessment as a Learning Problem
Neela Sawant, Srinivasan H. Sengamedu

TL;DR
This paper proposes a machine learning approach to assess code compliance with policies by embedding code and policy text into a shared space, enabling scalable and effective compliance prediction and search.
Contribution
It introduces a novel ML framework that models code compliance as a joint embedding problem, leveraging data filtering and pre-training to handle the lack of task-specific data.
Findings
Policy2Code achieves higher accuracy than CodeBERT on benchmark datasets.
Zero-shot evaluation demonstrates the model's ability to generalize to unseen policies.
User study shows Policy2Code's detections are more accepted than baseline methods.
Abstract
Manual code reviews and static code analyzers are the traditional mechanisms to verify if source code complies with coding policies. However, these mechanisms are hard to scale. We formulate code compliance assessment as a machine learning (ML) problem, to take as input a natural language policy and code, and generate a prediction on the code's compliance, non-compliance, or irrelevance. This can help scale compliance classification and search for policies not covered by traditional mechanisms. We explore key research questions on ML model formulation, training data, and evaluation setup. The core idea is to obtain a joint code-text embedding space which preserves compliance relationships via the vector distance of code and policy embeddings. As there is no task-specific data, we re-interpret and filter commonly available software datasets with additional pre-training and pre-finetuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Advanced Malware Detection Techniques
MethodsCodeBERT
