Knowledge-Aware Code Generation with Large Language Models

Tao Huang; Zhihong Sun; Zhi Jin; Ge Li; Chen Lyu

arXiv:2401.15940·cs.SE·February 2, 2024·2 cites

Knowledge-Aware Code Generation with Large Language Models

Tao Huang, Zhihong Sun, Zhi Jin, Ge Li, Chen Lyu

PDF

Open Access 1 Repo

TL;DR

This paper introduces KareCoder, a knowledge-aware approach that enhances large language models' ability to solve novel programming problems by integrating a specialized knowledge library, significantly improving performance on unseen tasks.

Contribution

The paper presents a novel dataset, CodeF, and a knowledge integration method, KareCoder, to improve LLMs' problem-solving on unfamiliar programming challenges.

Findings

01

KareCoder improves Pass@1 by 23.3% on CodeF.

02

It outperforms direct ChatGPT code generation on novel problems.

03

It maintains strong performance on previously encountered problems.

Abstract

Large Language Models (LLMs) perform well on basic programming problems. However, they encounter challenges when dealing with complex tasks involving the use of diverse algorithmic and data structure skills, particularly programming competition-level problems. Notably, ChatGPT exhibits proficient performance on problems it has encountered during its pre-training phase, but this performance deteriorates when faced with novel problems. Consequently, enhancing the ability of LLMs to address unfamiliar problems has emerged as a pivotal research focus. The problem-solving process of LLMs mirrors human programmers' approach to a certain extent. When confronted with new programming tasks, human programmers engage in task planning and code writing with the previously acquired knowledge about algorithms and data structures. Despite having learned such knowledge, LLMs struggle to effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

codegeneration3/karecoder
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Speech and dialogue systems · Natural Language Processing Techniques