Calibration of Large Language Models on Code Summarization
Yuvraj Virk, Premkumar Devanbu, Toufique Ahmed

TL;DR
This paper investigates how to calibrate large language models to reliably assess whether their code summaries are sufficiently similar to human-generated ones, addressing the challenge of evaluating AI-generated summaries without human references.
Contribution
The study introduces methods to predict the likelihood that an LLM-generated code summary resembles a human-written summary, enhancing the reliability of automated code summarization evaluation.
Findings
Approaches can predict the similarity of AI summaries to human summaries.
Calibration methods work across multiple languages and LLMs.
Reliable confidence measures improve trust in AI-generated code summaries.
Abstract
A brief, fluent, and relevant summary can be helpful during program comprehension; however, such a summary does require significant human effort to produce. Often, good summaries are unavailable in software projects, which makes maintenance more difficult. There has been a considerable body of research into automated AI-based methods, using Large Language models (LLMs), to generate summaries of code; there also has been quite a bit of work on ways to measure the performance of such summarization methods, with special attention paid to how closely these AI-generated summaries resemble a summary a human might have produced. Measures such as BERTScore and BLEU have been suggested and evaluated with human-subject studies. However, LLM-generated summaries can be inaccurate, incomplete, etc.: generally, too dissimilar to one that a good developer might write. Given an LLM-generated code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
