Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition
Dominik Fuch{\ss}, Haoyu Liu, Sophie Corallo, Tobias Hey, Jan Keim, Johannes von Geisau, Anne Koziolek

TL;DR
This paper demonstrates how LLMs can automatically identify architectural entities in software documentation and code, enabling automated traceability link recovery without manual model creation.
Contribution
It introduces the ArTEMiS approach for entity matching and extends the ExArch method for automatic SAM extraction, improving traceability in software engineering.
Findings
ExArch achieves F1: 0.86 using only SAD and code.
ArTEMiS matches heuristic approaches with F1: 0.81.
Combined approaches outperform baseline without manual SAMs.
Abstract
Identifying architecturally relevant entities in textual artifacts is crucial for Traceability Link Recovery (TLR) between Software Architecture Documentation (SAD) and source code. While Software Architecture Models (SAMs) can bridge the semantic gap between these artifacts, their manual creation is time-consuming. LLMs offer new capabilities for extracting architectural entities from SAD and source code to construct SAMs automatically or establish direct trace links. This paper extends our ICSA 2025 paper [19], which introduced Extracting Architecture (ExArch) for LLM-based architecture component name extraction. The extension contributes the novel Architecture Traceability with Entity Matching via Semantic inference (ArTEMiS) approach, an extended evaluation with additional LLMs, configurations, a revised benchmark, and a combined evaluation of both approaches. Specifically, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
