VIBEPASS: Can Vibe Coders Really Pass the Vibe Check?
Srijan Bansal, Jiao Fangkai, Yilun Zhou, Austin Xu, Shafiq Joty, Semih Yavuz

TL;DR
This paper introduces VIBEPASS, an empirical framework for evaluating large language models' ability to generate diagnostic tests and repair code faults, revealing current limitations in fault reasoning despite high syntactic test validity.
Contribution
It presents the first systematic evaluation of models' fault-triggering and fault-targeted repair capabilities, highlighting the bottleneck in fault reasoning over code generation.
Findings
Models produce valid tests at high rates but struggle with discriminative fault detection.
Fault hypothesis generation is the main bottleneck, not test validity.
Self-generated tests can effectively guide repairs when faults are witnessed.
Abstract
As Large Language Models shift the programming toward human-guided ''vibe coding'', agentic coding tools increasingly rely on models to self-diagnose and repair their own subtle faults -- a capability central to autonomous software engineering yet never systematically evaluated. We present \name{}, the first empirical decomposition that jointly evaluates two coupled tasks: \emph{Fault-Triggering Test Generation (FT-Test)} constructing a discriminative witness that exposes a latent bug, and \emph{Fault-targeted Program Repair (FPR)}, repairing it under varying diagnostic conditions. \name{} pairs competitive programming problems with LLM-generated solutions that pass partial test suites but fail on semantic edge cases, enabling controlled identification of where the diagnostic chain breaks down. Evaluating 12 frontier LLMs, we find that fault-targeted reasoning does not scale with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Machine Learning and Algorithms
