ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs
Rusheng Pan, Bingcheng Mao, Tianyi Ma, Zhenhua Ling

TL;DR
ArchAgent is a scalable framework that leverages static analysis and LLMs to accurately recover multi-view, business-aligned software architectures from large, complex legacy codebases, addressing challenges like architectural drift and missing relations.
Contribution
It introduces a novel agent-based approach combining static analysis, adaptive segmentation, and LLM synthesis for architecture recovery, with scalable diagram generation and cross-repository data integration.
Findings
Significant improvements over existing benchmarks in architecture recovery accuracy.
Dependency context enhances the accuracy of architecture generation.
Effective recovery of critical business logic from legacy projects.
Abstract
Recovering accurate architecture from large-scale legacy software is hindered by architectural drift, missing relations, and the limited context of Large Language Models (LLMs). We present ArchAgent, a scalable agent-based framework that combines static analysis, adaptive code segmentation, and LLM-powered synthesis to reconstruct multiview, business-aligned architectures from cross-repository codebases. ArchAgent introduces scalable diagram generation with contextual pruning and integrates cross-repository data to identify business-critical modules. Evaluations of typical large-scale GitHub projects show significant improvements over existing benchmarks. An ablation study confirms that dependency context improves the accuracy of generated architectures of production-level repositories, and a real-world case study demonstrates effective recovery of critical business logics from legacy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Advanced Software Engineering Methodologies
