Leveraging LLM Inconsistency to Boost Pass@k Performance

Uri Dalal; Meirav Segal; Zvika Ben-Haim; Dan Lahav; Omer Nevo

arXiv:2505.12938·cs.LG·May 21, 2025

Leveraging LLM Inconsistency to Boost Pass@k Performance

Uri Dalal, Meirav Segal, Zvika Ben-Haim, Dan Lahav, Omer Nevo

PDF

Open Access

TL;DR

This paper introduces a novel method that leverages the inconsistency of large language models to improve Pass@k performance by generating multiple task variants and selecting the best solution, demonstrating both theoretical and empirical benefits.

Contribution

The paper proposes a task-agnostic Variator agent that exploits LLM inconsistency to enhance solution accuracy, supported by theoretical modeling and empirical validation.

Findings

01

Outperforms baseline on APPS dataset

02

Inconsistency persists across model generations and domains

03

Method is applicable to various tasks and future models

Abstract

Large language models (LLMs) achieve impressive abilities in numerous domains, but exhibit inconsistent performance in response to minor input changes. Rather than view this as a drawback, in this paper we introduce a novel method for leveraging models' inconsistency to boost Pass@k performance. Specifically, we present a "Variator" agent that generates k variants of a given task and submits one candidate solution for each one. Our variant generation approach is applicable to a wide range of domains as it is task agnostic and compatible with free-form inputs. We demonstrate the efficacy of our agent theoretically using a probabilistic model of the inconsistency effect, and show empirically that it outperforms the baseline on the APPS dataset. Furthermore, we establish that inconsistency persists even in frontier reasoning models across coding and cybersecurity domains, suggesting our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education