ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization

Suyoung Bae; CheolWon Na; Jaehoon Lee; Yumin Lee; YunSeok Choi; Jee-Hyong Lee

arXiv:2604.10520·cs.CL·April 14, 2026

ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization

Suyoung Bae, CheolWon Na, Jaehoon Lee, Yumin Lee, YunSeok Choi, Jee-Hyong Lee

PDF

1 Repo

TL;DR

ReFEree is a novel reference-free, segment-level evaluation method for assessing factual consistency in real-world code summaries, outperforming previous approaches by aligning closely with human judgment.

Contribution

It introduces a fine-grained, dependency-aware evaluation framework specifically designed for multi-sentence code summaries, with a new benchmark and improved correlation with human assessments.

Findings

01

ReFEree achieves the highest correlation with human judgment among 13 baselines.

02

It improves over previous state-of-the-art by 15-18%.

03

The method effectively evaluates factual consistency at the segment level.

Abstract

As Large Language Models (LLMs) have become capable of generating long and descriptive code summaries, accurate and reliable evaluation of factual consistency has become a critical challenge. However, previous evaluation methods are primarily designed for short summaries of isolated code snippets. Consequently, they struggle to provide fine-grained evaluation of multi-sentence functionalities and fail to accurately assess dependency context commonly found in real-world code summaries. To address this, we propose ReFEree, a reference-free and fine-grained method for evaluating factual consistency in real-world code summaries. We define factual inconsistency criteria specific to code summaries and evaluate them at the segment level using these criteria along with dependency information. These segment-level results are then aggregated into a fine-grained score. We construct a code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bsy99615/ReFEree.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.