ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries

Kishan Maharaj; Vitobha Munigala; Srikanth G. Tamilselvam; Prince Kumar; Sayandeep Sen; Palani Kodeswaran; Abhijit Mishra; Pushpak Bhattacharyya

arXiv:2410.14748·cs.SE·September 9, 2025

ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries

Kishan Maharaj, Vitobha Munigala, Srikanth G. Tamilselvam, Prince Kumar, Sayandeep Sen, Palani Kodeswaran, Abhijit Mishra, Pushpak Bhattacharyya

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces ETF, a novel framework that uses static analysis and LLMs to detect hallucinations in code summaries, supported by a new dataset and achieving high accuracy.

Contribution

The paper presents the first dataset for hallucination detection in code summarisation and proposes ETF, a new entity tracing framework combining static analysis and LLM verification.

Findings

01

Achieved 73% F1 score in hallucination detection

02

Created the first dataset with ~10K samples for this task

03

Demonstrated ETF's effectiveness in localising errors

Abstract

Recent advancements in large language models (LLMs) have significantly enhanced their ability to understand both natural language and code, driving their use in tasks like natural language-to-code (NL2Code) and code summarisation. However, LLMs are prone to hallucination, outputs that stray from intended meanings. Detecting hallucinations in code summarisation is especially difficult due to the complex interplay between programming and natural languages. We introduce a first-of-its-kind dataset, CodeSumEval, with ~10K samples, curated specifically for hallucination detection in code summarisation. We further propose a novel Entity Tracing Framework (ETF) that a) utilises static program analysis to identify code entities from the program and b) uses LLMs to map and verify these entities and their intents within generated code summaries. Our experimental analysis demonstrates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

kishanmaharaj/ETF-CodeSumEval
dataset· 35 dl
35 dl

Videos

ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries· underline

Taxonomy

TopicsAdvanced Text Analysis Techniques · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies