Code Comprehension then Auditing for Unsupervised LLM Evaluation

Bhrij Patel; Souradip Chakraborty; Mengdi Wang; Dinesh Manocha; Amrit Singh Bedi

arXiv:2410.03131·cs.AI·April 2, 2026

Code Comprehension then Auditing for Unsupervised LLM Evaluation

Bhrij Patel, Souradip Chakraborty, Mengdi Wang, Dinesh Manocha, Amrit Singh Bedi

PDF

TL;DR

CoCoA introduces a two-step unsupervised framework for code correctness evaluation, improving interpretability and accuracy by first understanding code functionality before assessing correctness.

Contribution

It proposes a novel sequential approach that separates code comprehension from correctness evaluation, enhancing reliability over prior joint inference methods.

Findings

01

Achieves up to 68% higher F1 score compared to baselines.

02

Increases accuracy by up to 20% across datasets and languages.

03

Improves interpretability by generating natural-language explanations.

Abstract

Large Language Models (LLMs) for unsupervised code correctness evaluation have recently gained attention because they can judge if code runs as intended without requiring reference implementations or unit tests, which may be unavailable, sparse, or unreliable. However, most prior approaches condition LLM evaluators directly on the full code implementation, forcing the model to jointly infer program behavior and evaluate correctness in a single step. This entanglement leads to misinterpretations of code behavior and unreliable judgments. To mitigate this issue, we introduce CoCoA, an unsupervised Code Comprehension then Auditing framework that first comprehends functionality to generate a natural-language explanation. Then it evaluates task alignment based on this explanation. By sequentially sampling comprehension before evaluation, CoCoA improves the quality of inferred program…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.