LLMBisect: Breaking Barriers in Bug Bisection with A Comparative Analysis Pipeline
Zheng Zhang, Haonan Li, Xingyu Li, Hang Zhang, Zhiyun Qian

TL;DR
This paper introduces LLMBisect, a multi-stage pipeline leveraging large language models to improve bug bisection accuracy by fully utilizing patch information and comparing multiple candidates, surpassing existing methods.
Contribution
The paper presents a novel multi-stage LLM-based pipeline for bug bisection that overcomes limitations of traditional methods by integrating textual and code analysis.
Findings
Achieves over 38% higher accuracy than state-of-the-art solutions.
Improves accuracy by 60% over baseline LLM-based bisection.
Effectively utilizes patch information and candidate comparison in bug localization.
Abstract
Bug bisection has been an important security task that aims to understand the range of software versions impacted by a bug, i.e., identifying the commit that introduced the bug. However, traditional patch-based bisection methods are faced with several significant barriers: For example, they assume that the bug-inducing commit (BIC) and the patch commit modify the same functions, which is not always true. They often rely solely on code changes, while the commit message frequently contains a wealth of vulnerability-related information. They are also based on simple heuristics (e.g., assuming the BIC initializes lines deleted in the patch) and lack any logical analysis of the vulnerability. In this paper, we make the observation that Large Language Models (LLMs) are well-positioned to break the barriers of existing solutions, e.g., comprehend both textual data and code in patches and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
