HalluJudge: A Reference-Free Hallucination Detection for Context Misalignment in Code Review Automation
Kla Tantithamthavorn, Hong Yi Lin, Patanamon Thongtanunam, Wachiraphan Charoenwet, Minwoo Jeong, Ming Wu

TL;DR
HalluJudge is a scalable, reference-free method for detecting hallucinations in LLM-generated code review comments by assessing their alignment with code context, improving trust and reliability in AI-assisted reviews.
Contribution
This work introduces HalluJudge, a novel approach with multiple strategies for hallucination detection in code reviews without needing reference comments.
Findings
HalluJudge achieves an F1 score of 0.85 in hallucination detection.
It is cost-effective, averaging $0.009 per assessment.
67% of its judgments align with developer preferences.
Abstract
Large Language models (LLMs) have shown strong capabilities in code review automation, such as review comment generation, yet they suffer from hallucinations -- where the generated review comments are ungrounded in the actual code -- poses a significant challenge to the adoption of LLMs in code review workflows. To address this, we explore effective and scalable methods for a hallucination detection in LLM-generated code review comments without the reference. In this work, we design HalluJudge that aims to assess the grounding of generated review comments based on the context alignment. HalluJudge includes four key strategies ranging from direct assessment to structured multi-branch reasoning (e.g., Tree-of-Thoughts). We conduct a comprehensive evaluation of these assessment strategies across Atlassian's enterprise-scale software projects to examine the effectiveness and cost-efficiency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques
