CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models

Tim Lukas Adam; Phongsakon Mark Konrad; Riccardo Terrenzi; Florian Girardo Lukas; Rahime Yilmaz; Krzysztof Sierszecki; Serkan Ayvaz

arXiv:2604.05755·cs.SE·April 8, 2026

CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models

Tim Lukas Adam, Phongsakon Mark Konrad, Riccardo Terrenzi, Florian Girardo Lukas, Rahime Yilmaz, Krzysztof Sierszecki, Serkan Ayvaz

PDF

TL;DR

The paper introduces CAKE, a comprehensive benchmark to evaluate large language models' understanding of cloud-native software architecture across multiple cognitive levels and topics.

Contribution

It presents a new benchmark with 188 expert-validated questions, evaluating 22 models and analyzing how different formats and augmentations affect model performance.

Findings

01

MCQ accuracy plateaus above 3B parameters at 99.2%.

02

Free-response scores increase steadily across cognitive levels.

03

Different formats reveal different aspects of model knowledge.

Abstract

In today's software architecture, large language models (LLMs) serve as software architecture co-pilots. However, no benchmark currently exists to evaluate large language models' actual understanding of cloud-native software architecture. For this reason we present a benchmark called CAKE, which consists of 188 expert-validated questions covering four cognitive levels of Bloom's revised taxonomy -- recall, analyze, design, and implement -- and five cloud-native topics. Evaluation is conducted on 22 model configurations (0.5B--70B parameters) across four LLM families, using three-run majority voting for multiple-choice questions (MCQs) and LLM-as-a-judge scoring for free-responses (FR). Based on this evaluation, four notable findings were identified. First, MCQ accuracy plateaus above 3B parameters, with the best model reaching 99.2\%. Second, free-response scores scale steadily across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.