A Unified Active Learning Framework for Annotating Graph Data with Application to Software Source Code Performance Prediction
Peter Samoaa, Linus Aronsson, Antonio Longa, Philipp Leitner, Morteza, Haghir Chehreghani

TL;DR
This paper introduces a unified active learning framework for software performance prediction that leverages graph representations of source code to efficiently select data for annotation, reducing labeling effort while maintaining high prediction accuracy.
Contribution
The paper presents a novel framework combining code parsing, graph augmentation, and embedding techniques to enable task-agnostic active learning for software performance prediction.
Findings
High prediction accuracy with minimal labeled data
Effective use of graph embeddings for active learning
Framework adaptable to various regression methods
Abstract
Most machine learning and data analytics applications, including performance engineering in software systems, require a large number of annotations and labelled data, which might not be available in advance. Acquiring annotations often requires significant time, effort, and computational resources, making it challenging. We develop a unified active learning framework specializing in software performance prediction to address this task. We begin by parsing the source code to an Abstract Syntax Tree (AST) and augmenting it with data and control flow edges. Then, we convert the tree representation of the source code to a Flow Augmented-AST graph (FA-AST) representation. Based on the graph representation, we construct various graph embeddings (unsupervised and supervised) into a latent space. Given such an embedding, the framework becomes task agnostic since active learning can be performed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Engineering Research · Software Testing and Debugging Techniques
MethodsTest
