Evaluating the Use of LLMs for Documentation to Code Traceability

Ebube Alor; SayedHassan Khatoonabadi; Emad Shihab

arXiv:2506.16440·cs.SE·August 8, 2025

Evaluating the Use of LLMs for Documentation to Code Traceability

Ebube Alor, SayedHassan Khatoonabadi, Emad Shihab

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the effectiveness of large language models in automating software documentation to code traceability, demonstrating their high accuracy and identifying key limitations for future improvement.

Contribution

It provides a comprehensive assessment of LLMs for traceability, introduces new datasets, and analyzes their capabilities and errors in establishing trace links.

Findings

01

LLMs achieve up to 80% F1-score in trace link identification.

02

Partial relationship explanations are highly accurate (>97%).

03

Performance varies in multi-step chain reconstruction.

Abstract

Large Language Models (LLMs) offer new potential for automating documentation-to-code traceability, yet their capabilities remain underexplored. We present a comprehensive evaluation of LLMs (Claude 3.5 Sonnet, GPT-4o, and o3-mini) in establishing trace links between various software documentation (including API references and user guides) and source code. We create two novel datasets from two open-source projects (Unity Catalog and Crawl4AI). Through systematic experiments, we assess three key capabilities: (1) trace link identification accuracy, (2) relationship explanation quality, and (3) multi-step chain reconstruction. Results show that the best-performing LLM achieves F1-scores of 79.4% and 80.4% across the two datasets, substantially outperforming our baselines (TF-IDF, BM25, and CodeBERT). While fully correct relationship explanations range from 42.9% to 71.1%, partial accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alor-e/evaluating-llm-doc-code-traceability
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Scientific Computing and Data Management · Advanced Malware Detection Techniques