Understanding on the Edge: LLM-generated Boundary Test Explanations
Sabinakhon Akbarova, Felix Dobslaw, Robert Feldt

TL;DR
This study evaluates GPT-4.1's ability to generate boundary explanations for software testing, finding that while generally helpful, explanations need clearer structure and actionable content for practical use.
Contribution
It provides an empirical assessment of LLM-generated boundary explanations and proposes design criteria for improving their usefulness in software testing.
Findings
63.5% positive ratings for explanations
Participants prefer structured, authoritative, and tailored explanations
Actionable examples are crucial for debugging support
Abstract
Boundary value analysis and testing (BVT) is fundamental in software quality assurance because faults tend to cluster at input extremes, yet testers often struggle to understand and justify why certain input-output pairs represent meaningful behavioral boundaries. Large Language Models (LLMs) could help by producing natural-language rationales, but their value for BVT has not been empirically assessed. We therefore conducted an exploratory study on LLM-generated boundary explanations: in a survey, twenty-seven software professionals rated GPT-4.1 explanations for twenty boundary pairs on clarity, correctness, completeness and perceived usefulness, and six of them elaborated in follow-up interviews. Overall, 63.5% of all ratings were positive (4-5 on a five-point Likert scale) compared to 17% negative (1-2), indicating general agreement but also variability in perceptions. Participants…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Engineering Techniques and Practices
