Active Assessment of Prediction Services as Accuracy Surface Over   Attribute Combinations

Vihari Piratla; Soumen Chakrabarty; Sunita Sarawagi

arXiv:2108.06514·cs.LG·October 27, 2021

Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations

Vihari Piratla, Soumen Chakrabarty, Sunita Sarawagi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces AAA, a Gaussian Process-based method to evaluate the accuracy surface of black-box classifiers across diverse attribute combinations, addressing challenges of heteroscedastic uncertainty and data sparsity.

Contribution

The paper proposes novel enhancements to Gaussian Process modeling for accuracy surfaces, including pooling observations and regularizing Beta scale parameters, improving estimation and exploration.

Findings

01

AAA accurately estimates accuracy surfaces across attribute combinations.

02

Enhanced GP approach effectively manages heteroscedastic uncertainty.

03

Method demonstrates superior exploration efficiency in experiments.

Abstract

Our goal is to evaluate the accuracy of a black-box classification model, not as a single aggregate on a given test data distribution, but as a surface over a large number of combinations of attributes characterizing multiple test data distributions. Such attributed accuracy measures become important as machine learning models get deployed as a service, where the training data distribution is hidden from clients, and different clients may be interested in diverse regions of the data distribution. We present Attributed Accuracy Assay (AAA)--a Gaussian Process (GP)--based probabilistic estimator for such an accuracy surface. Each attribute combination, called an 'arm', is associated with a Beta density from which the service's accuracy is sampled. We expect the GP to smooth the parameters of the Beta density over related arms to mitigate sparsity. We show that obvious application of GPs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vihari/aaa
pytorchOfficial

Videos

Active Assessment of Prediction Services as Accuracy Surface Over Attribute Combinations· slideslive

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning

Methodstravel james · Greedy Policy Search · Gaussian Process