TL;DR
This paper introduces CommitBERT, a pre-trained model for automatic commit message generation using a large dataset of code changes and commit messages across six programming languages, improving natural language descriptions of code modifications.
Contribution
The paper presents a new dataset of 345K code modifications and commit messages, and proposes two training methods to enhance commit message generation with a pre-trained programming language model.
Findings
The model achieves improved BLEU-4 scores on commit message generation.
Preprocessing and domain-specific pre-training significantly enhance performance.
The dataset and models are publicly available for further research.
Abstract
Commit message is a document that summarizes source code changes in natural language. A good commit message clearly shows the source code changes, so this enhances collaboration between developers. Therefore, our work is to develop a model that automatically writes the commit message. To this end, we release 345K datasets consisting of code modification and commit messages in six programming languages (Python, PHP, Go, Java, JavaScript, and Ruby). Similar to the neural machine translation (NMT) model, using our dataset, we feed the code modification to the encoder input and the commit message to the decoder input and measure the result of the generated commit message with BLEU-4. Also, we propose the following two training methods to improve the result of generating the commit message: (1) A method of preprocessing the input to feed the code modification to the encoder input. (2) A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
