TL;DR
PSIMiner is a tool that extracts and enriches abstract syntax trees from code using IDE static analysis, improving machine learning models for code understanding tasks like method name prediction.
Contribution
The paper introduces PSIMiner, a novel tool for processing PSI trees from IntelliJ IDEA to enhance code representations for machine learning applications.
Findings
Successfully inferred types of identifiers in Java ASTs.
Extended code2seq model with enriched AST data.
Improved accuracy in method name prediction.
Abstract
The application of machine learning algorithms to source code has grown in the past years. Since these algorithms are quite sensitive to input data, it is not surprising that researchers experiment with input representations. Nowadays, a popular starting point to represent code is abstract syntax trees (ASTs). Abstract syntax trees have been used for a long time in various software engineering domains, and in particular in IDEs. The API of modern IDEs allows to manipulate and traverse ASTs, resolve references between code elements, etc. Such algorithms can enrich ASTs with new data and therefore may be useful in ML-based code analysis. In this work, we present PSIMiner - a tool for processing PSI trees from the IntelliJ Platform. PSI trees contain code syntax trees as well as functions to work with them, and therefore can be used to enrich code representation using static analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
