Generating Software Architecture Description from Source Code using Reverse Engineering and Large Language Model
Ahmad Hatahet, Christoph Knieke, Andreas Rausch

TL;DR
This paper presents a semi-automated method combining reverse engineering and large language models to generate comprehensive software architecture descriptions directly from source code, improving maintainability and reducing manual effort.
Contribution
It introduces a novel approach that integrates reverse engineering with LLMs to automatically recover static and behavioral architectural views from source code.
Findings
Successfully generates component diagrams from C++ source code.
Accurately models component behavior using few-shot prompting.
Reduces manual effort in architectural documentation.
Abstract
Software Architecture Descriptions (SADs) are essential for managing the inherent complexity of modern software systems. They enable high-level architectural reasoning, guide design decisions, and facilitate effective communication among diverse stakeholders. However, in practice, SADs are often missing, outdated, or poorly aligned with the system's actual implementation. Consequently, developers are compelled to derive architectural insights directly from source code-a time-intensive process that increases cognitive load, slows new developer onboarding, and contributes to the gradual degradation of clarity over the system's lifetime. To address these issues, we propose a semi-automated generation of SADs from source code by integrating reverse engineering (RE) techniques with a Large Language Model (LLM). Our approach recovers both static and behavioral architectural views by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Software Engineering Methodologies · Software Engineering Research · Software System Performance and Reliability
