When simplicity meets effectiveness: Detecting code comments coherence with word embeddings and LSTM
Michael Dubem Igbomezie, Phuong T. Nguyen, Davide Di Ruscio

TL;DR
This paper introduces Co3D, a practical method for detecting code comment coherence using word embeddings and LSTM, emphasizing internal word meaning and order, and outperforming existing approaches.
Contribution
The work presents a simple yet effective approach combining word2vec and LSTM for code comment coherence detection, highlighting the importance of internal features over complex models.
Findings
Co3D outperforms well-established baselines in coherence detection.
Using simple architectures can achieve satisfying prediction performance.
Emphasizing internal meaning and word order improves coherence detection accuracy.
Abstract
Code comments play a crucial role in software development, as they provide programmers with practical information, allowing them to understand better the intent and semantics of the underpinning code. Nevertheless, developers tend to leave comments unchanged after updating the code, resulting in a discrepancy between the two artifacts. Such a discrepancy may trigger misunderstanding and confusion among developers, impeding various activities, including code comprehension and maintenance. Thus, it is crucial to identify if, given a code snippet, its corresponding comment is coherent and reflects well the intent behind the code. Unfortunately, existing approaches to this problem, while obtaining an encouraging performance, either rely on heavily pre-trained models, or treat input data as text, neglecting the intrinsic features contained in comments and code, including word order and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Text Analysis Techniques · Topic Modeling
