DocuMint: Docstring Generation for Python using Small Language Models
Bibek Poudel, Adam Cook, Sekou Traore, Shelah Ameli

TL;DR
This paper evaluates small language models for generating high-quality Python docstrings, introduces a large dataset called DocuMint for fine-tuning, and demonstrates performance improvements through fine-tuning and benchmarking.
Contribution
It introduces DocuMint, a large dataset for fine-tuning language models for docstring generation, and provides a comprehensive evaluation of small language models' effectiveness.
Findings
Llama 3 8B achieved the best quantitative scores.
CodeGemma 7B scored highest in human evaluation.
Fine-tuning with DocuMint improves model performance significantly.
Abstract
Effective communication, specifically through documentation, is the beating heart of collaboration among contributors in software development. Recent advancements in language models (LMs) have enabled the introduction of a new type of actor in that ecosystem: LM-powered assistants capable of code generation, optimization, and maintenance. Our study investigates the efficacy of small language models (SLMs) for generating high-quality docstrings by assessing accuracy, conciseness, and clarity, benchmarking performance quantitatively through mathematical formulas and qualitatively through human evaluation using Likert scale. Further, we introduce DocuMint, as a large-scale supervised fine-tuning dataset with 100,000 samples. In quantitative experiments, Llama 3 8B achieved the best performance across all metrics, with conciseness and clarity scores of 0.605 and 64.88, respectively.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
MethodsLLaMA
