LLM-Based Robustness Testing of Microservice Applications: An Empirical Study

Hrushitha Goud Tigulla; Marco Vieira

arXiv:2605.14202·cs.SE·May 15, 2026

LLM-Based Robustness Testing of Microservice Applications: An Empirical Study

Hrushitha Goud Tigulla, Marco Vieira

PDF

TL;DR

This study evaluates how different prompt strategies and models affect the diversity and effectiveness of LLM-generated robustness tests for microservice APIs, revealing that prompt design significantly impacts failure coverage.

Contribution

It introduces and empirically compares prompt strategies, including GuidedFewShot, demonstrating their influence on failure detection and the importance of domain context in LLM-based testing.

Findings

01

Prompt strategy influences failure diversity more than model size.

02

GuidedFewShot achieves high failure mode coverage with low similarity.

03

Taxonomy rules alone are insufficient without concrete examples.

Abstract

Malformed, missing, or boundary-value inputs in microservice APIs can cascade across dependent services, threatening reliability. Robustness testing systematically exercises such inputs to expose server-side failures, but generating diverse, effective tests remains challenging. Large Language Models can generate such tests from API specifications; however, it is unknown whether different models and prompt strategies produce diverse failure sets or converge on the same failures. We report a controlled experiment applying 7 prompt strategies to 3 open-source LLMs (14B-70B parameters) targeting 2 architecturally distinct microservice systems: one Java monolingual (6 services, 9 failure modes) and one polyglot (27 services, 14 failure modes), yielding 38 valid runs and 663 generated tests. We find that prompt strategy explains more variation in diversity than model size: a Structured prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.