Knowledge-Enhanced Program Repair for Data Science Code

Shuyin Ouyang; Jie M. Zhang; Zeyu Sun; Albert Merono Penuela

arXiv:2502.09771·cs.SE·February 24, 2025

Knowledge-Enhanced Program Repair for Data Science Code

Shuyin Ouyang, Jie M. Zhang, Zeyu Sun, Albert Merono Penuela

PDF

Open Access

TL;DR

This paper presents DSrepair, a knowledge-enhanced method for repairing data science code generated by LLMs, utilizing knowledge graphs and AST-based bug localization to improve repair accuracy and efficiency.

Contribution

Introduces DSrepair, a novel knowledge-enhanced program repair approach that leverages data science knowledge graphs and AST analysis to improve LLM-generated code repair.

Findings

01

DSrepair outperforms five state-of-the-art LLM-based repair baselines.

02

It fixes 14.2% to 44.4% more buggy code snippets across four LLMs.

03

It reduces token usage per code task by up to 34.24%.

Abstract

This paper introduces DSrepair, a knowledge-enhanced program repair method designed to repair the buggy code generated by LLMs in the data science domain. DSrepair uses knowledge graph based RAG for API knowledge retrieval as well as bug knowledge enrichment to construct repair prompts for LLMs. Specifically, to enable knowledge graph based API retrieval, we construct DS-KG (Data Science Knowledge Graph) for widely used data science libraries. For bug knowledge enrichment, we employ an abstract syntax tree (AST) to localize errors at the AST node level. DSrepair's effectiveness is evaluated against five state-of-the-art LLM-based repair baselines using four advanced LLMs on the DS-1000 dataset. The results show that DSrepair surpasses all five baselines. Specifically, when compared to the second-best baseline, DSrepair demonstrates significant improvements, fixing 44.4%, 14.2%, 20.6%,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistics Education and Methodologies · Scientific Computing and Data Management