MLDebugging: Towards Benchmarking Code Debugging Across Multi-Library Scenarios
Jinyang Huang, Xiachong Feng, Qiguang Chen, Hanjie Zhao, Zihui Cheng, Jiesong Bai, Jingxuan Zhou, Min Li, Libo Qin

TL;DR
This paper introduces MLDebugging, a comprehensive benchmark for evaluating code debugging in complex multi-library Python scenarios, revealing current LLMs' limitations in such settings.
Contribution
It presents the first benchmark specifically designed for multi-library debugging, covering 126 libraries and seven issue types, and evaluates LLM performance in this challenging context.
Findings
Current LLMs struggle with multi-library debugging tasks.
MLDebugging reveals significant gaps in LLM capabilities for complex code scenarios.
Benchmark provides a new resource for future research in multi-library debugging.
Abstract
Code debugging is a crucial task in software engineering, which attracts increasing attention. While remarkable success has been made in the era of large language models (LLMs), current research still focuses on the simple no-library or single-library setting, ignoring the complex multi-library scenario in real-world applications. To address this limitation, we make the first attempt to introduce MLDebugging (Multi-Library Debugging), a comprehensive benchmark designed to assess debugging challenges within multi-library Python code. Specifically, MLDebugging encompasses 126 distinct Python libraries, covering a wide range of multi-library code issues, categorized into seven distinct types. Furthermore, we conduct a thorough evaluation of MLDebugging using both mainstream open-source and closed-source LLMs and highlight that current LLMs still struggle to correctly perform code debugging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Parallel Computing and Optimization Techniques · Software Testing and Debugging Techniques
