Using LLMs to Evaluate Architecture Documents: Results from a Digital Marketplace Environment

Frank Elberzhager; Matthias Gerbershagen; Joshua Ginkel

arXiv:2601.19693·cs.SE·January 28, 2026

Using LLMs to Evaluate Architecture Documents: Results from a Digital Marketplace Environment

Frank Elberzhager, Matthias Gerbershagen, Joshua Ginkel

PDF

Open Access

TL;DR

This study investigates how large language models (LLMs) can support software architects in evaluating architecture documents, revealing that LLM performance correlates with document quality but also showing inconsistencies needing further research.

Contribution

It provides empirical insights into the effectiveness and limitations of LLMs in architecture document evaluation within a digital marketplace context.

Findings

01

LLMs' evaluation quality improves with higher document quality

02

LLMs show promising potential but exhibit inconsistencies

03

Better document quality leads to more consistent LLM and human evaluations

Abstract

Generative AI plays an increasing role during software engineering activities to make them, e.g., more efficient or provide better quality. However, it is often unclear how much benefit LLMs really provide. We concentrate on software architects and investigated how an LLM-supported evaluation of architecture documents can support software architects to improve such artefacts. In the context of a research project where a digital marketplace is developed and digital solutions should be analyzed, we used different LLMs to analyze the quality of architecture documents and compared the results with evaluations from software architects. We found out that the quality of the artifact has a strong influence on the quality of the LLM, i.e., the better the quality of the architecture document was, the more consistent were the LLM-based evaluation and the human expert evaluation. While using LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education