Correctness Assessment of Code Generated by Large Language Models Using Internal Representations

Tuan-Dung Bui; Thanh Trong Vu; Thu-Trang Nguyen; Son Nguyen; and Hieu Dinh Vo

arXiv:2501.12934·cs.SE·August 5, 2025

Correctness Assessment of Code Generated by Large Language Models Using Internal Representations

Tuan-Dung Bui, Thanh Trong Vu, Thu-Trang Nguyen, Son Nguyen, and Hieu Dinh Vo

PDF

Open Access 1 Repo

TL;DR

This paper introduces OPENIA, a white-box framework that leverages internal representations of LLMs to accurately assess code correctness, outperforming traditional black-box methods across multiple benchmarks.

Contribution

The paper presents OPENIA, a novel open-box approach utilizing internal LLM states for correctness assessment, demonstrating significant improvements over existing black-box techniques.

Findings

01

Internal representations encode correctness-related information.

02

OPENIA achieves up to 2X accuracy improvement.

03

Enhanced robustness in repository-specific code evaluation.

Abstract

Ensuring the correctness of code generated by Large Language Models (LLMs) presents a significant challenge in AI-driven software development. Existing approaches predominantly rely on black-box (closed-box) approaches that evaluate correctness post-generation, failing to utilize the rich insights embedded in the LLMs' internal states during code generation. In this paper, we introduce OPENIA, a novel white-box (open-box) framework that leverages these internal representations to assess the correctness of LLM-generated code. OPENIA systematically analyzes the intermediate states of representative open-source LLMs specialized for code, including DeepSeek-Coder, CodeLlama, and MagicCoder, across diverse code generation benchmarks. Our empirical analysis reveals that these internal representations encode latent information, which strongly correlates with the correctness of the generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ise-uet-vnu/openia
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Machine Learning and Data Classification · Software Reliability and Analysis Research