Multi-granular Software Annotation using File-level Weak Labelling

Cezar Sas; Andrea Capiluppi

arXiv:2311.11607·cs.SE·December 1, 2023·1 cites

Multi-granular Software Annotation using File-level Weak Labelling

Cezar Sas, Andrea Capiluppi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a weak labelling and hierarchical aggregation method for multi-granular software annotation, significantly aiding developers in understanding large codebases more efficiently.

Contribution

It presents a novel approach combining weak labelling and hierarchical aggregation to automate multi-level code annotations, improving scalability and usefulness.

Findings

01

Correctly annotated 50% of files

02

Annotated over 50% of packages

03

Identified three new relevant labels per project on average

Abstract

One of the most time-consuming tasks for developers is the comprehension of new code bases. An effective approach to aid this process is to label source code files with meaningful annotations, which can help developers understand the content and functionality of a code base quicker. However, most existing solutions for code annotation focus on project-level classification: manually labelling individual files is time-consuming, error-prone and hard to scale. The work presented in this paper aims to automate the annotation of files by leveraging project-level labels; and using the file-level annotations to annotate items at larger levels of granularity, for example, packages and a whole project. We propose a novel approach to annotate source code files using a weak labelling approach and a subsequent hierarchical aggregation. We investigate whether this approach is effective in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sascezar/codegraphclassification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Engineering Techniques and Practices