Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding

Ziv Nevo; Orna Raz; Karen Yorav

arXiv:2511.03549·cs.SE·November 6, 2025

Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding

Ziv Nevo, Orna Raz, Karen Yorav

PDF

Open Access

TL;DR

This paper presents a system that leverages GitHub artifacts like pull requests, issues, and commit messages to improve large language models' understanding of code, providing more grounded and helpful explanations.

Contribution

It introduces a novel method combining GitHub context extraction with LLMs to generate and validate code explanations, enhancing interpretability in software development.

Findings

01

Generated explanations are often helpful and non-trivial.

02

System reduces hallucinations in code explanations.

03

User study indicates improved understanding with insights.

Abstract

Understanding the purpose of source code is a critical task in software maintenance, onboarding, and modernization. While large language models (LLMs) have shown promise in generating code explanations, they often lack grounding in the broader software engineering context. We propose a novel approach that leverages natural language artifacts from GitHub -- such as pull request descriptions, issue descriptions and discussions, and commit messages -- to enhance LLM-based code understanding. Our system consists of three components: one that extracts and structures relevant GitHub context, another that uses this context to generate high-level explanations of the code's purpose, and a third that validates the explanation. We implemented this as a standalone tool, as well as a server within the Model Context Protocol (MCP), enabling integration with other AI-assisted development tools. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software Engineering Techniques and Practices