Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification
Fabian C. Pe\~na, Steffen Herbold

TL;DR
This paper assesses Sentence-BERT's effectiveness and efficiency in multi-label code comment classification, demonstrating a trade-off between model size, accuracy, and computational cost.
Contribution
It provides a detailed evaluation of Sentence-BERT models for code comment classification, balancing performance gains with efficiency constraints.
Findings
Larger models achieve higher F1 scores.
Smaller models offer better runtime and GFLOPS efficiency.
A balanced model improves F1 by 0.0346 with minimal efficiency loss.
Abstract
This work evaluates Sentence-BERT for a multi-label code comment classification task seeking to maximize the classification performance while controlling efficiency constraints during inference. Using a dataset of 13,216 labeled comment sentences, Sentence-BERT models are fine-tuned and combined with different classification heads to recognize comment types. While larger models outperform smaller ones in terms of F1, the latter offer outstanding efficiency, both in runtime and GFLOPS. As result, a balance between a reasonable F1 improvement (+0.0346) and a minimal efficiency degradation (+1.4x in runtime and +2.1x in GFLOPS) is reached.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Text and Document Classification Technologies
