Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung, Park, Jack Hessel, Lijuan Wang, Yejin Choi

TL;DR
This paper introduces a taxonomy of uncertainty in vision-language AI, creates a large benchmark dataset called CertainlyUncertain for evaluating uncertainty awareness, and proposes a new confidence-weighted accuracy metric.
Contribution
It provides a novel taxonomy of epistemic and aleatoric uncertainty, a large benchmark dataset with contrastive VQA samples, and a new metric for assessing uncertainty awareness in AI systems.
Findings
The taxonomy distinguishes between epistemic and aleatoric uncertainty.
The benchmark dataset contains 178K contrastive VQA samples.
The confidence-weighted accuracy correlates well with accuracy and calibration error.
Abstract
The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Epistemology, Ethics, and Metaphysics · Education and Critical Thinking Development
MethodsInpainting
