Time Travel: LLM-Assisted Semantic Behavior Localization with Git Bisect

Yujing Wang; Weize Hong

arXiv:2511.18854·cs.SE·November 25, 2025

Time Travel: LLM-Assisted Semantic Behavior Localization with Git Bisect

Yujing Wang, Weize Hong

PDF

Open Access

TL;DR

This paper introduces a novel LLM-augmented framework for semantic fault localization in Git bisect, improving success rates and reducing bisect time by handling noisy, non-deterministic software behaviors.

Contribution

It integrates structured reasoning and fine-tuning of LLMs into Git bisect, addressing challenges of flaky tests and semantic divergence in modern software development.

Findings

01

6.4 percentage point increase in success rate

02

Up to 2x reduction in bisect time

03

Effective handling of noisy, non-deterministic faults

Abstract

We present a novel framework that integrates Large Language Models (LLMs) into the Git bisect process for semantic fault localization. Traditional bisect assumes deterministic predicates and binary failure states assumptions often violated in modern software development due to flaky tests, nonmonotonic regressions, and semantic divergence from upstream repositories. Our system augments bisect traversal with structured chain of thought reasoning, enabling commit by commit analysis under noisy conditions. We evaluate multiple open source and proprietary LLMs for their suitability and fine tune DeepSeekCoderV2 using QLoRA on a curated dataset of semantically labeled diffs. We adopt a weak supervision workflow to reduce annotation overhead, incorporating human in the loop corrections and self consistency filtering. Experiments across multiple open source projects show a 6.4 point absolute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques