Context-Augmented Code Generation Using Programming Knowledge Graphs

Iman Saberi; Fatemeh Fard

arXiv:2410.18251·cs.SE·June 17, 2025

Context-Augmented Code Generation Using Programming Knowledge Graphs

Iman Saberi, Fatemeh Fard

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel framework using Programming Knowledge Graphs to improve code retrieval and generation in large language models, significantly enhancing accuracy and reducing hallucinations.

Contribution

It presents PKG-based retrieval, tree pruning, re-ranking, and FIM augmentation techniques to advance code generation quality and relevance.

Findings

01

Up to 20% improvement in pass@1 accuracy on HumanEval.

02

Outperforms state-of-the-art models by up to 34% on MBPP.

03

Effective reduction of irrelevant context and hallucinations.

Abstract

Large Language Models (LLMs) and Code-LLMs (CLLMs) have significantly improved code generation, but, they frequently face difficulties when dealing with challenging and complex problems. Retrieval-Augmented Generation (RAG) addresses this issue by retrieving and integrating external knowledge at the inference time. However, retrieval models often fail to find most relevant context, and generation models, with limited context capacity, can hallucinate when given irrelevant data. We present a novel framework that leverages a Programming Knowledge Graph (PKG) to semantically represent and retrieve code. This approach enables fine-grained code retrieval by focusing on the most relevant segments while reducing irrelevant context through a tree-pruning technique. PKG is coupled with a re-ranking mechanism to reduce even more hallucinations by selectively integrating non-RAG solutions. We…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 8Confidence 4

Strengths

- PKG: semantically represent and retrieve code and provides fine-grained code retrieval by focusing on the most relevant segments while reducing irrelevant context through a tree-pruning technique. - The results indicate the benefits of using PKG. - The experiments are thorough, and the paper is well written.

Weaknesses

- There still exists a gap between ideal ranker & re-ranker, what was the reason for it. - How reliable is the result? Based on multiple generations with low temperatures - Samples in MBPP & HumanEval present with misaligned or incorrect NL, under which error is that catered to? - When block-PKG performs better than func-PKG then why is ranking done for considering func-PKG? - Need to show cases of extraction with PKG providing better function than current RAG.

Reviewer 02Rating 3Confidence 4

Strengths

+ The idea and the presentation are clear. + The method is demonstrated to be useful in the given setting, i.e., retrieve code segments from a given code QA dataset. It's a somewhat novel idea to learn code representation in a knowledge graph.

Weaknesses

+ The setting is kind of weird to me: in real-world applications, code generation is usually augmented by code documents, natural language thoughts, or similar question-solution pairs, which is to say, natural language could be used to retrieve helpful information. Instead, only focusing on code representations may be limited. + Accordingly, in a not realistic setting, the experiments are not convincing enough to me. For example, on both HumanEval and MBPP, using BM25 for RAG consistently gets l

Reviewer 03Rating 5Confidence 4

Strengths

1. The PKG approach adds a structured layer to context retrieval, improving relevance in code generation. 2. Tree pruning and re-ranking help eliminate irrelevant information, enhancing the quality of generated code. 3. Through function- and block-level code, the framework could provide highly relevant and precise context. 4. The approach demonstrates considerable improvements on established benchmarks like HumanEval and MBPP.

Weaknesses

1. Building and maintaining the Programming Knowledge Graph may be resource-intensive and require domain expertise. 2.The framework’s effectiveness may be constrained to specific programming languages (e.g. Python) of code tasks. 3.The paper could benefit from a deeper exploration where PKG and retrieval mechanisms fail to improve or potentially hinder code generation quality.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Software Testing and Debugging Techniques · Software Engineering Research

MethodsPruning