exLong: Generating Exceptional Behavior Tests with Large Language Models
Jiyang Zhang, Yu Liu, Pengyu Nie, Junyi Jessy Li, Milos Gligoric

TL;DR
exLong is a novel framework using large language models to automatically generate exceptional behavior tests, addressing the common developer focus on happy paths and improving test coverage for exception handling.
Contribution
This paper introduces exLong, the first LLM-based framework specifically designed to generate exceptional behavior tests for programming languages.
Findings
exLong outperforms state-of-the-art models like CAT-LM and GPT-4o.
23 generated EBTs by exLong were accepted into open-source projects.
exLong effectively captures reasoning about exception traces and guards.
Abstract
Many popular programming languages, including C#, Java, and Python, support exceptions. Exceptions are thrown during program execution if an unwanted event happens, e.g., a method is invoked with an illegal argument value. Software developers write exceptional behavior tests (EBTs) to check that their code detects unwanted events and throws appropriate exceptions. Prior research studies have shown the importance of EBTs, but those studies also highlighted that developers put most of their efforts on "happy paths", e.g., paths without unwanted events. To help developers fill the gap, we present the first framework, dubbed exLong, that automatically generates EBTs. exLong is a large language model instruction fine-tuned from CodeLlama and embeds reasoning about traces that lead to throw statements, conditional expressions that guard throw statements, and non-exceptional behavior tests…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
