Automated Type Annotation in Python Using Large Language Models
Varun Bharti, Shashwat Jha, Dhruv Kumar, Pankaj Jalote

TL;DR
This paper investigates using large language models to automate Python type annotations, proposing a generate-check-repair pipeline that achieves high consistency and accuracy without task-specific training.
Contribution
The authors introduce a novel LLM-based pipeline for Python type annotation that outperforms traditional methods and requires no fine-tuning.
Findings
GPT 4oMini achieves 65.9% consistency in annotations.
GPT 4.1mini and O3Mini reach approximately 88.6% consistency.
Up to 70.5% exact match accuracy with under one repair iteration.
Abstract
Type annotations in Python enhance maintainability and error detection. However, generating these annotations manually is error prone and requires extra effort. Traditional automation approaches like static analysis, machine learning, and deep learning struggle with limited type vocabularies, behavioral over approximation, and reliance on large labeled datasets. In this work, we explore the use of LLMs for generating type annotations in Python. We develop a generate check repair pipeline: the LLM proposes annotations guided by a Concrete Syntax Tree representation, a static type checker (Mypy) verifies them, and any errors are fed back for iterative refinement. We evaluate four LLM variants: GPT 4oMini, GPT 4.1mini (general-purpose), and O3Mini, O4Mini (reasoning optimized), on 6000 code snippets from the ManyTypes4Py benchmark. We first measure the proportion of code snippets annotated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Natural Language Processing Techniques · Computational Physics and Python Applications
