Predicting the Performance of Black-box LLMs through Follow-up Queries

Dylan Sam; Marc Finzi; J. Zico Kolter

arXiv:2501.01558·cs.LG·December 2, 2025

Predicting the Performance of Black-box LLMs through Follow-up Queries

Dylan Sam, Marc Finzi, J. Zico Kolter

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to predict black-box language model behavior by using follow-up queries and response probabilities, enabling detection of correctness, adversarial manipulation, and model identity.

Contribution

It presents a novel approach that leverages follow-up question responses to reliably predict model correctness and detect adversarial or misrepresented models in black-box settings.

Findings

01

Linear models on follow-up responses predict correctness accurately.

02

Follow-up responses distinguish between clean and adversarially manipulated models.

03

Method outperforms some white-box predictors and detects model misrepresentation.

Abstract

Reliably predicting the behavior of language models -- such as whether their outputs are correct or have been adversarially manipulated -- is a fundamentally challenging task. This is often made even more difficult as frontier language models are offered only through closed-source APIs, providing only black-box access. In this paper, we predict the behavior of black-box language models by asking follow-up questions and taking the probabilities of responses \emph{as} representations to train reliable predictors. We first demonstrate that training a linear model on these responses reliably and accurately predicts model correctness on question-answering and reasoning benchmarks. Surprisingly, this can \textit{even outperform white-box linear predictors} that operate over model internals or activations. Furthermore, we demonstrate that these follow-up question responses can reliably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dsam99/quere
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Data Mining Algorithms and Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Multi-Head Attention · {Dispute@FaQ-s}How to file a dispute with Expedia? · Layer Normalization · Byte Pair Encoding · Linear Warmup With Cosine Annealing · Dense Connections