Understanding the Characteristics of LLM-Generated Property-Based Tests in Exploring Edge Cases
Hidetake Tanaka, Haruto Tanaka, Kazumasa Shimari, Kenichi Matsumoto

TL;DR
This paper compares property-based testing and example-based testing for detecting edge cases in LLM-generated code, showing that combining both methods enhances bug detection and code reliability.
Contribution
It provides an empirical analysis of PBT and EBT characteristics in LLM code testing, highlighting their complementary strengths and proposing a hybrid testing approach.
Findings
Combining PBT and EBT increases bug detection from 68.75% to 81.25%.
PBT excels at detecting performance issues and edge cases.
EBT effectively identifies boundary conditions and special input patterns.
Abstract
As Large Language Models (LLMs) increasingly generate code in software development, ensuring the quality of LLM-generated code has become important. Traditional testing approaches using Example-based Testing (EBT) often miss edge cases -- defects that occur at boundary values, special input patterns, or extreme conditions. This research investigates the characteristics of LLM-generated Property-based Testing (PBT) compared to EBT for exploring edge cases. We analyze 16 HumanEval problems where standard solutions failed on extended test cases, generating both PBT and EBT test codes using Claude-4-sonnet. Our experimental results reveal that while each method individually achieved a 68.75\% bug detection rate, combining both approaches improved detection to 81.25\%. The analysis demonstrates complementary characteristics: PBT effectively detects performance issues and edge cases through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
