Assessing REST API Test Generation Strategies with Log Coverage

Nana Reinikainen; Mika M\"antyl\"a; Yuqing Wang

arXiv:2604.07073·cs.SE·April 20, 2026

Assessing REST API Test Generation Strategies with Log Coverage

Nana Reinikainen, Mika M\"antyl\"a, Yuqing Wang

PDF

TL;DR

This paper evaluates different REST API test generation strategies using novel log coverage metrics, revealing their complementarity and effectiveness in uncovering diverse runtime behaviors.

Contribution

It introduces three log coverage metrics for black-box testing and empirically compares evolutionary, LLM-based, and human tests on a microservice system.

Findings

01

Claude Opus 4.6 uncovers 28.4% more log templates than human tests.

02

Combining human and Claude tests increases total log coverage by 78.4%.

03

GPT-5.2-Codex uncovers 38.6% fewer logs but complements other strategies.

Abstract

Assessing the effectiveness of REST API tests in black-box settings can be challenging due to the lack of access to source code coverage metrics and polyglot tech stack. We propose three metrics for capturing average, minimum, and maximum log coverage to handle the diverse test generation results and runtime behaviors over multiple runs. Using log coverage, we empirically evaluate three REST API test generation strategies, Evolutionary computing (EvoMaster v5.0.2), LLMs (Claude Opus 4.6 and GPT-5.2-Codex), and human-written Locust load tests, on Light-OAuth2 authorization microservice system. On average, Claude Opus 4.6 tests uncover 28.4% more unique log templates than human-written tests, whereas EvoMaster and GPT-5.2-Codex find 26.1% and 38.6% fewer, respectively. Next, we analyze combined log coverage to assess complementarity between strategies. Combining human-written tests with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.