Generative AI for Software Metadata: Overview of the Information Retrieval in Software Engineering Track at FIRE 2023
Srijoni Majumdar, Soumen Paul, Debjyoti Paul, Ayan Bandyopadhyay,, Samiran Chattopadhyay, Partha Pratim Das, Paul D Clough, Prasenjit Majumder

TL;DR
This paper reviews the IRSE track at FIRE 2023, focusing on automated evaluation of code comments using machine learning and large language models, highlighting their impact on bias and overfitting in classification tasks.
Contribution
It provides an overview of 56 experiments from 17 teams applying ML to classify code comments, analyzing the effects of large language model-generated labels on bias and overfitting.
Findings
Large language model labels increase prediction bias.
Using LLM-generated labels reduces overfitting.
Multiple ML approaches and features were evaluated.
Abstract
The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments in a machine learning framework based on human and large language model generated labels. In this track, there is a binary classification task to classify comments as useful and not useful. The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source github C based projects and an additional dataset generated individually by teams using large language models. Overall 56 experiments have been submitted by 17 teams from various universities and software companies. The submissions have been evaluated quantitatively using the F1-Score and qualitatively based on the type of features developed, the supervised learning model used and their corresponding hyper-parameters. The labels generated from large language models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices
