Inferring Pluggable Types with Machine Learning
Kazi Amanul Islam Siddiqui, Martin Kellogg

TL;DR
This paper explores using machine learning models to automatically infer type qualifiers in pluggable type systems, aiming to ease deployment in legacy codebases by reducing manual annotation effort.
Contribution
It introduces the NaP-AST representation and evaluates multiple models, demonstrating the effectiveness of Graph Transformer Networks in inferring type qualifiers.
Findings
GTN achieves 0.89 recall and 0.6 precision.
Models perform well with around 16k classes, overfitting occurs beyond 22k classes.
Applying models reduces warnings in open-source Java projects.
Abstract
Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Machine Learning and Data Classification
MethodsAttention Is All You Need · Softmax · Layer Normalization · Laplacian EigenMap · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer
