Multi-granular Software Annotation using File-level Weak Labelling
Cezar Sas, Andrea Capiluppi

TL;DR
This paper introduces a weak labelling and hierarchical aggregation method for multi-granular software annotation, significantly aiding developers in understanding large codebases more efficiently.
Contribution
It presents a novel approach combining weak labelling and hierarchical aggregation to automate multi-level code annotations, improving scalability and usefulness.
Findings
Correctly annotated 50% of files
Annotated over 50% of packages
Identified three new relevant labels per project on average
Abstract
One of the most time-consuming tasks for developers is the comprehension of new code bases. An effective approach to aid this process is to label source code files with meaningful annotations, which can help developers understand the content and functionality of a code base quicker. However, most existing solutions for code annotation focus on project-level classification: manually labelling individual files is time-consuming, error-prone and hard to scale. The work presented in this paper aims to automate the annotation of files by leveraging project-level labels; and using the file-level annotations to annotate items at larger levels of granularity, for example, packages and a whole project. We propose a novel approach to annotate source code files using a weak labelling approach and a subsequent hierarchical aggregation. We investigate whether this approach is effective in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Engineering Techniques and Practices
