When to Accept Automated Predictions and When to Defer to Human Judgment?
Daniel Sikar, Artur Garcez, Tillman Weyde, Robin Bloomfield, Kaleem, Peeroo

TL;DR
This paper introduces a clustering-based metric to assess the reliability of neural network predictions under distribution shifts, helping decide when to trust automated decisions or defer to humans.
Contribution
It proposes a novel distance-based confidence measure using clustering of neural network outputs to evaluate prediction safety during distribution shifts.
Findings
The proposed metric correlates with prediction correctness under distribution shifts.
It is effective across different datasets and neural network architectures.
The approach can guide when to accept automated predictions or defer to humans.
Abstract
Ensuring the reliability and safety of automated decision-making is crucial. It is well-known that data distribution shifts in machine learning can produce unreliable outcomes. This paper proposes a new approach for measuring the reliability of predictions under distribution shifts. We analyze how the outputs of a trained neural network change using clustering to measure distances between outputs and class centroids. We propose this distance as a metric to evaluate the confidence of predictions under distribution shifts. We assign each prediction to a cluster with centroid representing the mean softmax output for all correct predictions of a given class. We then define a safety threshold for a class as the smallest distance from an incorrect prediction to the given class centroid. We evaluate the approach on the MNIST and CIFAR-10 datasets using a Convolutional Neural Network and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications
MethodsAttention Is All You Need · Residual Connection · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Dense Connections · Absolute Position Encodings
