Incoherence as Oracle-less Measure of Error in LLM-Based Code Generation

Thomas Valentin; Ardi Madadi; Gaetano Sapia; Marcel B\"ohme

arXiv:2507.00057·cs.PL·December 16, 2025

Incoherence as Oracle-less Measure of Error in LLM-Based Code Generation

Thomas Valentin, Ardi Madadi, Gaetano Sapia, Marcel B\"ohme

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces 'incoherence,' an oracle-less measure to estimate the likelihood of errors in LLM-generated code, enabling reliable error detection without needing correct reference implementations.

Contribution

It proposes a novel incoherence metric that efficiently estimates code correctness in the absence of an oracle, aligning well with traditional oracle-based evaluations.

Findings

01

Incoherence identifies about two-thirds of incorrect programs without false positives.

02

The method reliably replaces oracle-based evaluation for LLM code correctness.

03

Strong correlation between incoherence-based ranking and oracle-based ranking of LLMs.

Abstract

Generating code from a natural language programming task is one of the most successful applications of Large Language Models (LLMs). Yet, the generated program may be buggy. Without an oracle, such as an existing, correct implementation or a formal specification, can we somehow estimate how likely the generated program is correct? In this paper, we propose a measure of incorrectness, called *incoherence*, that can be estimated efficiently in the absence of an oracle and allows us to establish a lower bound on the error, i.e., the probability that the LLM-generated program for that specification is incorrect. In our experiments, our incoherence-based methodology can automatically identify about two-thirds of incorrect programs without reports of false positives for the average task. In fact, *an oracle-based evaluation of LLMs can be reliably replaced by an incoherence-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mpi-softsec/difftrust
noneOfficial

Videos

Incoherence as Oracle-less Measure of Error in LLM-Based Code Generation· underline

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques