Evaluating Software Process Models for Multi-Agent Class-Level Code Generation

Wasique Islam Shafin; Md Nakhla Rafi; Zhenhao Li; Tse-Hsun Chen

arXiv:2511.09794·cs.SE·November 14, 2025

Evaluating Software Process Models for Multi-Agent Class-Level Code Generation

Wasique Islam Shafin, Md Nakhla Rafi, Zhenhao Li, Tse-Hsun Chen

PDF

Open Access

TL;DR

This study evaluates how structured multi-agent workflows influence large language model performance in class-level code generation, revealing trade-offs between code quality, correctness, and failure modes.

Contribution

It provides empirical analysis of process-structured multi-agent LLM workflows, highlighting their impact on code quality and failure characteristics in software development.

Findings

01

Waterfall workflows produce cleaner, more maintainable code but reduce correctness.

02

Process constraints shift failure types from structural to semantic errors.

03

Testing improves verification but introduces new reasoning failures.

Abstract

Modern software systems require code that is not only functional but also maintainable and well-structured. Although Large Language Models (LLMs) are increasingly used to automate software development, most studies focus on isolated, single-agent function-level generation. This work examines how process structure and role specialization shape multi-agent LLM workflows for class-level code generation. We simulate a Waterfall-style development cycle covering Requirement, Design, Implementation, and Testing using three LLMs (GPT-4o-mini, DeepSeek-Chat, and Claude-3.5-Haiku) on 100 Python tasks from the ClassEval benchmark. Our findings show that multi-agent workflows reorganize, rather than consistently enhance, model performance. Waterfall-style collaboration produces cleaner and more maintainable code but often reduces functional correctness (-37.8\% for GPT-4o-mini and -39.8\% for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Multi-Agent Systems and Negotiation