Learning to Format Coq Code Using Language Models
Pengyu Nie, Karl Palmskog, Junyi Jessy Li, Milos Gligoric

TL;DR
This paper explores using language models to learn and suggest idiomatic formatting conventions in Coq code, aiming to improve code readability and consistency across diverse coding styles.
Contribution
It introduces initial models that learn formatting conventions from existing Coq code and suggests idiomatic formatting, demonstrating feasibility with a preliminary implementation.
Findings
Model trained on 164k lines of Coq code from MathComp projects
Preliminary results show potential for automatic formatting suggestions
Approach outperforms static rule-based formatting tools in flexibility
Abstract
Should the final right bracket in a record declaration be on a separate line? Should arguments to the rewrite tactic be separated by a single space? Coq code tends to be written in distinct manners by different people and teams. The expressiveness, flexibility, and extensibility of Coq's languages and notations means that Coq projects have a wide variety of recognizable coding styles, sometimes explicitly documented as conventions on naming and formatting. In particular, even inexperienced users can distinguish vernacular using the standard library and plain Ltac from idiomatic vernacular using the Mathematical Components (MathComp) library and SSReflect. While coding conventions are important for comprehension and maintenance, they are costly to document and enforce. Rule-based formatters, such as Coq's beautifier, have limited flexibility and only capture small fractions of desired…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Software Engineering Research · Topic Modeling
