Explicit Abstention Knobs for Predictable Reliability in Video Question Answering
Jorge Ortiz

TL;DR
This paper explores confidence-based abstention in video question answering systems, demonstrating its effectiveness in controlling error rates in-distribution and under distribution shifts, with implications for high-stakes applications.
Contribution
It introduces and evaluates confidence thresholding as a method for predictable reliability in video question answering models, especially under distribution shifts.
Findings
Confidence thresholding offers mechanistic control over error rates.
Smooth risk-coverage tradeoffs are achievable by adjusting thresholds.
Control remains robust under distribution shifts.
Abstract
High-stakes deployment of vision-language models (VLMs) requires selective prediction, where systems abstain when uncertain rather than risk costly errors. We investigate whether confidence-based abstention provides reliable control over error rates in video question answering, and whether that control remains robust under distribution shift. Using NExT-QA and Gemini 2.0 Flash, we establish two findings. First, confidence thresholding provides mechanistic control in-distribution. Sweeping threshold epsilon produces smooth risk-coverage tradeoffs, reducing error rates f
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
