COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis
Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo, Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu

TL;DR
This paper introduces DEBUGEVAL, a comprehensive benchmark for evaluating LLMs' multi-stage code debugging abilities, and proposes COAST, a multi-agent data synthesis framework that significantly improves the debugging performance of smaller models.
Contribution
The paper presents DEBUGEVAL for multi-stage debugging evaluation and introduces COAST, a novel multi-agent data synthesis method that enhances small LLMs' debugging capabilities.
Findings
7B-scale models underperform larger models in debugging tasks
COAST-generated data improves small LLM debugging performance
COAST enables 7B models to match GPT-3.5 debugging abilities
Abstract
Code debugging is a vital stage of software development, essential for ensuring the reliability and performance of Large Language Models (LLMs) in the code generation task. Human debugging typically follows a multi-stage process, which includes Bug Localization, Bug Identification, Code Repair, and Code Recognition. However, existing code debugging benchmarks predominantly focus on the Code Repair stage, which offers only a limited perspective on evaluating the debugging capabilities of LLMs. In this paper, we introduce DEBUGEVAL, a comprehensive benchmark for evaluating the debugging abilities of LLMs by emulating the multi-stage human debugging process. Through evaluating on DEBUGEVAL, we observe that 7B-scale models consistently underperform compared to their larger counterparts, highlighting their limitations in comprehending code semantics. In this case, we propose the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services · Multi-Agent Systems and Negotiation · Semantic Web and Ontologies
MethodsAttention Is All You Need · Linear Layer · Weight Decay · Multi-Head Attention · {Dispute@FaQ-s}How to file a dispute with Expedia? · Layer Normalization · Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · 15 Ways to Contact How can i speak to someone at Delta Airlines
