Black-box model classification under the discriminative factorization
Hayden Helm, Merrick Ohata, Carey Priebe

TL;DR
This paper introduces discriminative factorization, a method to evaluate and select query sets for black-box model classification, improving inference accuracy with fewer queries.
Contribution
It proposes a novel discriminative factorization framework that predicts classification performance decay and guides query set selection in black-box models.
Findings
Discriminative factorization predicts empirical performance decay rate.
Query sets selected using the estimated discriminative field match oracle query set ordering.
Probability of chance-level classification decreases exponentially with query budget.
Abstract
Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based on the relationship between their embedded responses to a set of queries are useful for inferring model-level properties, the quality of these representations is highly sensitive to the query set. We introduce the \emph{discriminative factorization} to distinguish between high- and low-quality query sets in the context of black-box model-level classification. Under this framework, the probability of chance-level classification decays exponentially in the query budget. On three auditing tasks, estimated factorization parameters predict the empirical performance decay rate. We conclude by showing that query sets selected using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
