Fine-grained Multi-Document Extraction and Generation of Code Change Rationale

Mehedi Sun; Antu Saha; Nadeeshan De Silva; Antonio Mastropaolo; and Oscar Chaparro

arXiv:2604.10345·cs.SE·April 14, 2026

Fine-grained Multi-Document Extraction and Generation of Code Change Rationale

Mehedi Sun, Antu Saha, Nadeeshan De Silva, Antonio Mastropaolo, and Oscar Chaparro

PDF

TL;DR

This paper investigates how rationale components behind code changes are scattered across artifacts and introduces ARGUS, an LLM-based system that synthesizes these components into concise summaries to aid software maintenance.

Contribution

The paper provides an empirical analysis of rationale distribution in software artifacts and proposes ARGUS, a novel LLM-based approach for multi-document rationale extraction and summarization.

Findings

01

Rationale components are highly fragmented across artifacts.

02

No single artifact captures all rationale components.

03

ARGUS achieves over 51% precision and 93% recall in rationale identification.

Abstract

Understanding the reasons behind past code changes is critical for many software engineering tasks, including refactoring and reviewing code, diagnosing bugs, and implementing new features. Unfortunately, locating and reconstructing this rationale can be difficult for developers because the information is often fragmented, inconsistently documented, and scattered across different artifacts such as commit messages, issue reports, and PRs. In this paper, we address this challenge in two steps. First, we conduct an empirical study of 63 commits from five open-source Java projects to analyze how rationale components (e.g., a change's goal, need, and alternative) are distributed across artifacts. We find that the rationale is highly fragmented: commit messages and pull requests primarily capture goals, while needs and alternatives are more often found in issues and PRs. Other components are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.