Type Prediction With Program Decomposition and Fill-in-the-Type Training

Federico Cassano; Ming-Ho Yee; Noah Shinn; Arjun Guha; Steven Holtzen

arXiv:2305.17145·cs.SE·May 30, 2023·1 cites

Type Prediction With Program Decomposition and Fill-in-the-Type Training

Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen

PDF

Open Access 1 Repo

TL;DR

This paper introduces OpenTau, a search-based method leveraging large language models for automated type prediction in untyped programs, improving type checking success rates and error reduction.

Contribution

It presents a novel search-based approach with a new metric, program decomposition, and fill-in-the-type fine-tuning for LLMs to enhance type prediction accuracy.

Findings

01

47.4% files type check, 14.5% absolute improvement

02

3.3 type errors per file on average

03

Effective use of program decomposition and fine-tuning

Abstract

TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain. This has motivated automated type prediction: given an untyped program, produce a well-typed output program. Large language models (LLMs) are promising for type prediction, but there are challenges: fill-in-the-middle performs poorly, programs may not fit into the context window, generated types may not type check, and it is difficult to measure how well-typed the output program is. We address these challenges by building OpenTau, a search-based approach for type prediction that leverages large language models. We propose a new metric for type prediction quality, give a tree-based program decomposition that searches a space of generated types, and present fill-in-the-type fine-tuning for LLMs. We evaluate our work with a new dataset for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gammatauai/opentau
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques