TL;DR
LARCH is a system that automatically generates accurate and coherent readme files for software repositories by identifying representative code fragments using heuristics and weak supervision, improving over baseline methods.
Contribution
The paper introduces LARCH, a novel approach that leverages heuristics and weak supervision to identify representative code for automatic readme generation using large language models.
Findings
LARCH outperforms baseline methods in generating coherent readmes.
Human and automated evaluations confirm LARCH's factual correctness.
Open-source implementation with VS Code and CLI interfaces.
Abstract
Writing a readme is a crucial aspect of software development as it plays a vital role in managing and reusing program code. Though it is a pain point for many developers, automatically creating one remains a challenge even with the recent advancements in large language models (LLMs), because it requires generating an abstract description from thousands of lines of code. In this demo paper, we show that LLMs are capable of generating a coherent and factually correct readmes if we can identify a code fragment that is representative of the repository. Building upon this finding, we developed LARCH (LLM-based Automatic Readme Creation with Heuristics) which leverages representative code identification with heuristics and weak supervision. Through human and automated evaluations, we illustrate that LARCH can generate coherent and factually correct readmes in the majority of cases,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
