Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models
Seetharam Killivalavan, Durairaj Thenmozhi

TL;DR
This paper demonstrates how integrating generative AI to produce additional labeled code comment data significantly improves the performance of code comment quality assessment models, advancing software engineering tools.
Contribution
The study introduces a novel approach of using generative AI to augment training data, leading to notable performance improvements in code comment classification models.
Findings
5.78% precision increase in SVM model
2.17% recall boost in ANN model
Enhanced model accuracy with generated data
Abstract
This paper explores a novel method for enhancing binary classification models that assess code comment quality, leveraging Generative Artificial Intelligence to elevate model performance. By integrating 1,437 newly generated code-comment pairs, labeled as "Useful" or "Not Useful" and sourced from various GitHub repositories, into an existing C-language dataset of 9,048 pairs, we demonstrate substantial model improvements. Using an advanced Large Language Model, our approach yields a 5.78% precision increase in the Support Vector Machine (SVM) model, improving from 0.79 to 0.8478, and a 2.17% recall boost in the Artificial Neural Network (ANN) model, rising from 0.731 to 0.7527. These results underscore Generative AI's value in advancing code comment classification models, offering significant potential for enhanced accuracy in software development and quality control. This study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
