Assertion-Aware Test Code Summarization with Large Language Models
Anamul Haque Mollah, Ahmed Aljohani, Hyunsook Do

TL;DR
This paper investigates how different prompting strategies influence large language models' ability to generate concise, accurate summaries of Java test code, emphasizing assertion semantics and providing a new benchmark dataset.
Contribution
It introduces a benchmark of 91 real-world Java test cases, conducts an ablation study on prompt components, and evaluates multiple LLMs with various prompt configurations for test code summarization.
Findings
Assertion semantics improve summary quality and reduce input tokens.
Codex and Qwen-Coder outperform other models in aligning with human summaries.
DeepSeek underperforms despite lexical overlap.
Abstract
Unit tests often lack concise summaries that convey test intent, especially in auto-generated or poorly documented codebases. Large Language Models (LLMs) offer a promising solution, but their effectiveness depends heavily on how they are prompted. Unlike generic code summarization, test-code summarization poses distinct challenges because test methods validate expected behavior through assertions rather than implementing functionality. This paper presents a new benchmark of 91 real-world Java test cases paired with developer-written summaries and conducts a controlled ablation study to investigate how test code-related components-such as the method under test (MUT), assertion messages, and assertion semantics-affect the performance of LLM-generated test summaries. We evaluate four code LLMs (Codex, Codestral, DeepSeek, and Qwen-Coder) across seven prompt configurations using n-gram…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Topic Modeling
