Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification

Fabian C. Pe\~na; Steffen Herbold

arXiv:2506.08581·cs.SE·June 16, 2025

Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification

Fabian C. Pe\~na, Steffen Herbold

PDF

Open Access 1 Repo

TL;DR

This paper assesses Sentence-BERT's effectiveness and efficiency in multi-label code comment classification, demonstrating a trade-off between model size, accuracy, and computational cost.

Contribution

It provides a detailed evaluation of Sentence-BERT models for code comment classification, balancing performance gains with efficiency constraints.

Findings

01

Larger models achieve higher F1 scores.

02

Smaller models offer better runtime and GFLOPS efficiency.

03

A balanced model improves F1 by 0.0346 with minimal efficiency loss.

Abstract

This work evaluates Sentence-BERT for a multi-label code comment classification task seeking to maximize the classification performance while controlling efficiency constraints during inference. Using a dataset of 13,216 labeled comment sentences, Sentence-BERT models are fine-tuned and combined with different classification heads to recognize comment types. While larger models outperform smaller ones in terms of F1, the latter offer outstanding efficiency, both in runtime and GFLOPS. As result, a balance between a reasonable F1 improvement (+0.0346) and a minimal efficiency degradation (+1.4x in runtime and +2.1x in GFLOPS) is reached.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aieng-lab/sbert-comment-classification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Text and Document Classification Technologies