Learning Type Inference for Enhanced Dataflow Analysis
Lukas Seidel, Sedick David Baker Effendi, Xavier Pinho, Konrad Rieck,, Brink van der Merwe, Fabian Yamaguchi

TL;DR
This paper introduces CodeTIDAL5, a Transformer-based model for type inference in dynamically-typed languages, improving accuracy and integration with static analysis tools to enhance code analysis and security research.
Contribution
We develop a novel Transformer-based model, CodeTIDAL5, that outperforms existing neural type inference systems and integrate it into Joern for practical static analysis applications.
Findings
CodeTIDAL5 achieves 71.27% accuracy on ManyTypes4TypeScript benchmark.
The model outperforms state-of-the-art by 7.85%.
Integration into Joern improves static analysis results.
Abstract
Statically analyzing dynamically-typed code is a challenging endeavor, as even seemingly trivial tasks such as determining the targets of procedure calls are non-trivial without knowing the types of objects at compile time. Addressing this challenge, gradual typing is increasingly added to dynamically-typed languages, a prominent example being TypeScript that introduces static typing to JavaScript. Gradual typing improves the developer's ability to verify program behavior, contributing to robust, secure and debuggable programs. In practice, however, users only sparsely annotate types directly. At the same time, conventional type inference faces performance-related challenges as program size grows. Statistical techniques based on machine learning offer faster inference, but although recent approaches demonstrate overall improved accuracy, they still perform significantly worse on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Security and Verification in Computing
