Using LLMs to Evaluate Architecture Documents: Results from a Digital Marketplace Environment
Frank Elberzhager, Matthias Gerbershagen, Joshua Ginkel

TL;DR
This study investigates how large language models (LLMs) can support software architects in evaluating architecture documents, revealing that LLM performance correlates with document quality but also showing inconsistencies needing further research.
Contribution
It provides empirical insights into the effectiveness and limitations of LLMs in architecture document evaluation within a digital marketplace context.
Findings
LLMs' evaluation quality improves with higher document quality
LLMs show promising potential but exhibit inconsistencies
Better document quality leads to more consistent LLM and human evaluations
Abstract
Generative AI plays an increasing role during software engineering activities to make them, e.g., more efficient or provide better quality. However, it is often unclear how much benefit LLMs really provide. We concentrate on software architects and investigated how an LLM-supported evaluation of architecture documents can support software architects to improve such artefacts. In the context of a research project where a digital marketplace is developed and digital solutions should be analyzed, we used different LLMs to analyze the quality of architecture documents and compared the results with evaluations from software architects. We found out that the quality of the artifact has a strong influence on the quality of the LLM, i.e., the better the quality of the architecture document was, the more consistent were the LLM-based evaluation and the human expert evaluation. While using LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
