A Library for Representing Python Programs as Graphs for Machine Learning
David Bieber, Kensen Shi, Petros Maniatis, Charles Sutton, Vincent, Hellendoorn, Daniel Johnson, Daniel Tarlow

TL;DR
This paper introduces an open source Python library that constructs various graph representations of Python programs, facilitating machine learning research on code by combining control-flow, data-flow, and syntactic information.
Contribution
The paper presents a novel library, python_graphs, that enables static analysis and graph construction of Python programs for machine learning applications.
Findings
Successfully applied to millions of programming submissions
Supports multiple graph types including control-flow and data-flow
Demonstrates utility in machine learning research on code
Abstract
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite ``program graphs'' that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Algorithms · Machine Learning in Materials Science
MethodsLib
