HalluJudge: A Reference-Free Hallucination Detection for Context Misalignment in Code Review Automation

Kla Tantithamthavorn; Hong Yi Lin; Patanamon Thongtanunam; Wachiraphan Charoenwet; Minwoo Jeong; Ming Wu

arXiv:2601.19072·cs.SE·March 26, 2026

HalluJudge: A Reference-Free Hallucination Detection for Context Misalignment in Code Review Automation

Kla Tantithamthavorn, Hong Yi Lin, Patanamon Thongtanunam, Wachiraphan Charoenwet, Minwoo Jeong, Ming Wu

PDF

Open Access

TL;DR

HalluJudge is a scalable, reference-free method for detecting hallucinations in LLM-generated code review comments by assessing their alignment with code context, improving trust and reliability in AI-assisted reviews.

Contribution

This work introduces HalluJudge, a novel approach with multiple strategies for hallucination detection in code reviews without needing reference comments.

Findings

01

HalluJudge achieves an F1 score of 0.85 in hallucination detection.

02

It is cost-effective, averaging $0.009 per assessment.

03

67% of its judgments align with developer preferences.

Abstract

Large Language models (LLMs) have shown strong capabilities in code review automation, such as review comment generation, yet they suffer from hallucinations -- where the generated review comments are ungrounded in the actual code -- poses a significant challenge to the adoption of LLMs in code review workflows. To address this, we explore effective and scalable methods for a hallucination detection in LLM-generated code review comments without the reference. In this work, we design HalluJudge that aims to assess the grounding of generated review comments based on the context alignment. HalluJudge includes four key strategies ranging from direct assessment to structured multi-branch reasoning (e.g., Tree-of-Thoughts). We conduct a comprehensive evaluation of these assessment strategies across Atlassian's enterprise-scale software projects to examine the effectiveness and cost-efficiency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques