Can Language Models Pass Software Testing Certification Exams? a case study

Fitash Ul Haq; Jordi Cabot

arXiv:2603.23142·cs.SE·March 25, 2026

Can Language Models Pass Software Testing Certification Exams? a case study

Fitash Ul Haq, Jordi Cabot

PDF

Open Access

TL;DR

This study evaluates whether large language models can pass software testing certification exams, analyzing their understanding, reasoning, and performance across different question types and transformations.

Contribution

It provides a comprehensive assessment of 60 LLMs on ISTQB exams, revealing their capabilities and limitations in software testing knowledge and reasoning.

Findings

01

Two models passed all certification exams with at least 65% score.

02

Commercial models generally outperform open-source models.

03

Transformations affect models' ability to answer correctly.

Abstract

Large Language Models (LLMs) play a pivotal role in both academic research and broader societal applications. LLMs are increasingly used in software testing activities such as test case generation, selection, and repair. However, several important questions remain: (1) do LLMs possess enough information about software testing principles to perform software testing tasks effectively? (2) do LLMs possess sufficient conceptual understanding of software testing to answer software testing questions under metamorphic transformations? and (3) do certain properties of software testing questions influence the performance of LLMs? To answer these questions, this study evaluates 60 multimodal language models from both commercial vendors and the open-source community. The evaluation is performed using 30 sample exams of different types (core foundation, core advanced, specialist, and expert) from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Software Engineering Techniques and Practices