Neural-Based Test Oracle Generation: A Large-scale Evaluation and Lessons Learned
Soneya Binta Hossain, Antonio Filieri, Matthew B. Dwyer, Sebastian, Elbaum, Willem Visser

TL;DR
This study evaluates the effectiveness of the neural-based test oracle generator TOGA across large-scale real-world Java projects, revealing significant limitations and providing insights for future improvements in automated oracle generation.
Contribution
It offers a comprehensive large-scale evaluation of TOGA, highlighting its limitations and lessons learned for advancing neural-based automated test oracle techniques.
Findings
TOGA outperforms some existing methods in defect detection.
It misclassifies oracle types 24% of the time.
Generated assertion oracles have a 47% false positive rate.
Abstract
Defining test oracles is crucial and central to test development, but manual construction of oracles is expensive. While recent neural-based automated test oracle generation techniques have shown promise, their real-world effectiveness remains a compelling question requiring further exploration and understanding. This paper investigates the effectiveness of TOGA, a recently developed neural-based method for automatic test oracle generation by Dinella et al. TOGA utilizes EvoSuite-generated test inputs and generates both exception and assertion oracles. In a Defects4j study, TOGA outperformed specification, search, and neural-based techniques, detecting 57 bugs, including 30 unique bugs not detected by other methods. To gain a deeper understanding of its applicability in real-world settings, we conducted a series of external, extended, and conceptual replication studies of TOGA. In a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
