Learning to Format Coq Code Using Language Models

Pengyu Nie; Karl Palmskog; Junyi Jessy Li; Milos Gligoric

arXiv:2006.16743·cs.HC·July 1, 2020

Learning to Format Coq Code Using Language Models

Pengyu Nie, Karl Palmskog, Junyi Jessy Li, Milos Gligoric

PDF

Open Access

TL;DR

This paper explores using language models to learn and suggest idiomatic formatting conventions in Coq code, aiming to improve code readability and consistency across diverse coding styles.

Contribution

It introduces initial models that learn formatting conventions from existing Coq code and suggests idiomatic formatting, demonstrating feasibility with a preliminary implementation.

Findings

01

Model trained on 164k lines of Coq code from MathComp projects

02

Preliminary results show potential for automatic formatting suggestions

03

Approach outperforms static rule-based formatting tools in flexibility

Abstract

Should the final right bracket in a record declaration be on a separate line? Should arguments to the rewrite tactic be separated by a single space? Coq code tends to be written in distinct manners by different people and teams. The expressiveness, flexibility, and extensibility of Coq's languages and notations means that Coq projects have a wide variety of recognizable coding styles, sometimes explicitly documented as conventions on naming and formatting. In particular, even inexperienced users can distinguish vernacular using the standard library and plain Ltac from idiomatic vernacular using the Mathematical Components (MathComp) library and SSReflect. While coding conventions are important for comprehension and maintenance, they are costly to document and enforce. Rule-based formatters, such as Coq's beautifier, have limited flexibility and only capture small fractions of desired…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Software Engineering Research · Topic Modeling