Evaluating the Use of LLMs for Documentation to Code Traceability
Ebube Alor, SayedHassan Khatoonabadi, Emad Shihab

TL;DR
This paper evaluates the effectiveness of large language models in automating software documentation to code traceability, demonstrating their high accuracy and identifying key limitations for future improvement.
Contribution
It provides a comprehensive assessment of LLMs for traceability, introduces new datasets, and analyzes their capabilities and errors in establishing trace links.
Findings
LLMs achieve up to 80% F1-score in trace link identification.
Partial relationship explanations are highly accurate (>97%).
Performance varies in multi-step chain reconstruction.
Abstract
Large Language Models (LLMs) offer new potential for automating documentation-to-code traceability, yet their capabilities remain underexplored. We present a comprehensive evaluation of LLMs (Claude 3.5 Sonnet, GPT-4o, and o3-mini) in establishing trace links between various software documentation (including API references and user guides) and source code. We create two novel datasets from two open-source projects (Unity Catalog and Crawl4AI). Through systematic experiments, we assess three key capabilities: (1) trace link identification accuracy, (2) relationship explanation quality, and (3) multi-step chain reconstruction. Results show that the best-performing LLM achieves F1-scores of 79.4% and 80.4% across the two datasets, substantially outperforming our baselines (TF-IDF, BM25, and CodeBERT). While fully correct relationship explanations range from 42.9% to 71.1%, partial accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Advanced Malware Detection Techniques
