Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines
Patrick Flynn, Tristan Vanderbruggen, Chunhua Liao, Pei-Hung, Lin, Murali Emani, Xipeng Shen

TL;DR
This paper analyzes the landscape of machine learning components for Programming Language Processing, aiming to improve their findability, accessibility, interoperability, and reusability to facilitate pipeline construction.
Contribution
It systematically characterizes key concepts and demonstrates how reusable components can be leveraged to build effective PLP pipelines.
Findings
Identification of core PLP tasks, models, and tools
Examples of reusable components in PLP pipeline construction
Enhanced understanding of component interoperability in PLP
Abstract
Programming Language Processing (PLP) using machine learning has made vast improvements in the past few years. Increasingly more people are interested in exploring this promising field. However, it is challenging for new researchers and developers to find the right components to construct their own machine learning pipelines, given the diverse PLP tasks to be solved, the large number of datasets and models being released, and the set of complex compilers or tools involved. To improve the findability, accessibility, interoperability and reusability (FAIRness) of machine learning components, we collect and analyze a set of representative papers in the domain of machine learning-based PLP. We then identify and characterize key concepts including PLP tasks, model architectures and supportive tools. Finally, we show some example use cases of leveraging the reusable components to construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Software Engineering Research · Machine Learning and Data Classification
