A Tale Of Two Long Tails
Daniel D'souza, Zach Nussbaum, Chirag Agarwal, Sara Hooker

TL;DR
This paper investigates the sources of uncertainty in machine learning models, proposing targeted data augmentation during training to better understand and differentiate between types of uncertain examples.
Contribution
It introduces a method for targeted data augmentation to analyze and distinguish different sources of uncertainty during model training.
Findings
Targeted augmentation improves understanding of uncertainty sources.
Atypical and noisy examples learn at different rates.
Interventions can effectively characterize uncertainty types.
Abstract
As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions. However, the majority of work on uncertainty has focused on traditional probabilistic or ranking approaches - where the model assigns low probabilities or scores to uncertain examples. While this captures what examples are challenging for the model, it does not capture the underlying source of the uncertainty. In this work, we seek to identify examples the model is uncertain about and characterize the source of said uncertainty. We explore the benefits of designing a targeted intervention - targeted data augmentation of the examples where the model is uncertain over the course of training. We investigate whether the rate of learning in the presence of additional information differs between atypical and noisy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
#92 - SARA HOOKER - Fairness, Interpretability, Language Models· youtube
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
