COAST: Enhancing the Code Debugging Ability of LLMs through   Communicative Agent Based Data Synthesis

Weiqing Yang; Hanbin Wang; Zhenghao Liu; Xinze Li; Yukun Yan; Shuo; Wang; Yu Gu; Minghe Yu; Zhiyuan Liu; Ge Yu

arXiv:2408.05006·cs.SE·February 13, 2025

COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis

Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo, Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces DEBUGEVAL, a comprehensive benchmark for evaluating LLMs' multi-stage code debugging abilities, and proposes COAST, a multi-agent data synthesis framework that significantly improves the debugging performance of smaller models.

Contribution

The paper presents DEBUGEVAL for multi-stage debugging evaluation and introduces COAST, a novel multi-agent data synthesis method that enhances small LLMs' debugging capabilities.

Findings

01

7B-scale models underperform larger models in debugging tasks

02

COAST-generated data improves small LLM debugging performance

03

COAST enables 7B models to match GPT-3.5 debugging abilities

Abstract

Code debugging is a vital stage of software development, essential for ensuring the reliability and performance of Large Language Models (LLMs) in the code generation task. Human debugging typically follows a multi-stage process, which includes Bug Localization, Bug Identification, Code Repair, and Code Recognition. However, existing code debugging benchmarks predominantly focus on the Code Repair stage, which offers only a limited perspective on evaluating the debugging capabilities of LLMs. In this paper, we introduce DEBUGEVAL, a comprehensive benchmark for evaluating the debugging abilities of LLMs by emulating the multi-stage human debugging process. Through evaluating on DEBUGEVAL, we observe that 7B-scale models consistently underperform compared to their larger counterparts, highlighting their limitations in comprehending code semantics. In this case, we propose the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neuir/coast
noneOfficial

Models

🤗
ntduc0901/llama3-8b-debugeval-lora
model· 2 dl
2 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsService-Oriented Architecture and Web Services · Multi-Agent Systems and Negotiation · Semantic Web and Ontologies

MethodsAttention Is All You Need · Linear Layer · Weight Decay · Multi-Head Attention · {Dispute@FaQ-s}How to file a dispute with Expedia? · Layer Normalization · Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · 15 Ways to Contact How can i speak to someone at Delta Airlines