Pandora's Regret: A Proper Scoring Rule for Evaluating Sequential Search
Gerardo A. Flores, Yash Deshpande, Jannis R. Brea, Ashia C. Wilson

TL;DR
This paper introduces Pandora's Regret, a new proper scoring rule tailored for sequential search, aligning model evaluation with search utility and improving predictive calibration.
Contribution
It derives Pandora's Regret as a pairwise-additive, strictly proper scoring rule that accounts for search costs and ranking, extending decision-theoretic scoring to multiclass settings.
Findings
Pandora's Regret better predicts clinical diagnostic costs than standard metrics.
It balances penalties for rank-swapping and probability magnitude within a Beta family.
Log loss and accuracy are shown to misalign with sequential search decision models.
Abstract
In sequential search, alternatives are tested until the true class is found. Standard proper scoring rules like log loss are local, ignoring the ranking of competitors and misaligning model evaluation with search utility. We show that sequential search induces a pairwise structure that overcomes this. By analyzing the expected cost of optimal search under varying testing costs, we derive Pandora's Regret: a closed-form, pairwise-additive, and strictly proper scoring rule. Pandora's Regret both elicits true probabilities and penalizes rank-reversing miscalibrations where distractors outrank the true class. Our construction yields a one-parameter Beta family that balances penalties for rank-swapping versus probability magnitude, while retaining a grounded interpretation as expected search cost. We prove that log loss, accuracy, and macro-F1 rely on implicit decision models misaligned with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
