Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding
Ziv Nevo, Orna Raz, Karen Yorav

TL;DR
This paper presents a system that leverages GitHub artifacts like pull requests, issues, and commit messages to improve large language models' understanding of code, providing more grounded and helpful explanations.
Contribution
It introduces a novel method combining GitHub context extraction with LLMs to generate and validate code explanations, enhancing interpretability in software development.
Findings
Generated explanations are often helpful and non-trivial.
System reduces hallucinations in code explanations.
User study indicates improved understanding with insights.
Abstract
Understanding the purpose of source code is a critical task in software maintenance, onboarding, and modernization. While large language models (LLMs) have shown promise in generating code explanations, they often lack grounding in the broader software engineering context. We propose a novel approach that leverages natural language artifacts from GitHub -- such as pull request descriptions, issue descriptions and discussions, and commit messages -- to enhance LLM-based code understanding. Our system consists of three components: one that extracts and structures relevant GitHub context, another that uses this context to generate high-level explanations of the code's purpose, and a third that validates the explanation. We implemented this as a standalone tool, as well as a server within the Model Context Protocol (MCP), enabling integration with other AI-assisted development tools. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software Engineering Techniques and Practices
