Calibration of Large Language Models on Code Summarization

Yuvraj Virk; Premkumar Devanbu; Toufique Ahmed

arXiv:2404.19318·cs.SE·June 3, 2025·1 cites

Calibration of Large Language Models on Code Summarization

Yuvraj Virk, Premkumar Devanbu, Toufique Ahmed

PDF

Open Access

TL;DR

This paper investigates how to calibrate large language models to reliably assess whether their code summaries are sufficiently similar to human-generated ones, addressing the challenge of evaluating AI-generated summaries without human references.

Contribution

The study introduces methods to predict the likelihood that an LLM-generated code summary resembles a human-written summary, enhancing the reliability of automated code summarization evaluation.

Findings

01

Approaches can predict the similarity of AI summaries to human summaries.

02

Calibration methods work across multiple languages and LLMs.

03

Reliable confidence measures improve trust in AI-generated code summaries.

Abstract

A brief, fluent, and relevant summary can be helpful during program comprehension; however, such a summary does require significant human effort to produce. Often, good summaries are unavailable in software projects, which makes maintenance more difficult. There has been a considerable body of research into automated AI-based methods, using Large Language models (LLMs), to generate summaries of code; there also has been quite a bit of work on ways to measure the performance of such summarization methods, with special attention paid to how closely these AI-generated summaries resemble a summary a human might have produced. Measures such as BERTScore and BLEU have been suggested and evaluated with human-subject studies. However, LLM-generated summaries can be inaccurate, incomplete, etc.: generally, too dissimilar to one that a good developer might write. Given an LLM-generated code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling