CodeRefine: A Pipeline for Enhancing LLM-Generated Code Implementations of Research Papers
Ekaterina Trofimova, Emil Sataev, Abhijit Singh Jowhari

TL;DR
CodeRefine is a comprehensive framework that leverages LLMs and structured knowledge extraction to automatically generate and improve code implementations from research papers, bridging the gap between theory and practice.
Contribution
It introduces a multi-step pipeline combining text summarization, knowledge graph creation, and retrieval-augmented generation for more accurate code synthesis from scientific literature.
Findings
Improves code accuracy over zero-shot LLM prompts
Effectively extracts and structures key paper information
Enhances code quality with retrieval-augmented generation
Abstract
This paper presents CodeRefine, a novel framework for automatically transforming research paper methodologies into functional code using Large Language Models (LLMs). Our multi-step approach first extracts and summarizes key text chunks from papers, analyzes their code relevance, and creates a knowledge graph using a predefined ontology. Code is then generated from this structured representation and enhanced through a proposed retrospective retrieval-augmented generation approach. CodeRefine addresses the challenge of bridging theoretical research and practical implementation, offering a more accurate alternative to LLM zero-shot prompting. Evaluations on diverse scientific papers demonstrate CodeRefine's ability to improve code implementation from the paper, potentially accelerating the adoption of cutting-edge algorithms in real-world applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Mathematics, Computing, and Information Processing
