Reconciling Model Multiplicity for Downstream Decision Making

Ally Yalei Du; Dung Daniel Ngo; Zhiwei Steven Wu

arXiv:2405.19667·cs.LG·May 31, 2024

Reconciling Model Multiplicity for Downstream Decision Making

Ally Yalei Du, Dung Daniel Ngo, Zhiwei Steven Wu

PDF

Open Access 3 Reviews

TL;DR

This paper addresses the challenge of model multiplicity in decision-making, proposing a calibration framework that aligns predictive models with downstream tasks, improving decision accuracy and model agreement.

Contribution

The paper introduces a novel calibration algorithm that reconciles predictive models for better downstream decision-making, even without direct access to true probability distributions.

Findings

01

Improved downstream decision-making losses with calibrated models

02

Models achieve near-universal agreement on best-response actions

03

Algorithm effective with empirical data sets

Abstract

We consider the problem of model multiplicity in downstream decision-making, a setting where two predictive models of equivalent accuracy cannot agree on the best-response action for a downstream loss function. We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to differ on a substantial portion of the population. We address this issue by proposing a framework that calibrates the predictive models with regard to both the downstream decision-making problem and the individual probability prediction. Specifically, leveraging tools from multi-calibration, we provide an algorithm that, at each time-step, first reconciles the differences in individual probability prediction, then calibrates the updated models such that they are indistinguishable from the true…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- This paper studied an important and practical problem of model multiplicity. The paper overall is well-organized and presented with a good clarity. - It is in particular helpful to have the illustrative example in Figure 1, which directly shows that it is insufficient to only update two predictive models so that they have improved squared loss and nearly agree on their individual predictions almost everywhere. - Theoretical guarantee shows that the new algorithm ReDCal provides an improved ac

Weaknesses

- The experimental results with the HAM10000 dataset show substantially larger error bars, and much less smooth convergence. It is helpful to provide more details on this differences between the two sets of results. - The experiments only compared to one other baseline proposed in (Roth et al 2023). How does the proposed algorithm compared to other related works in the model multiplicity?

Reviewer 02Rating 6Confidence 4

Strengths

The paper highlights an important problem, that improvements to prediction models can hurt downstream decision-making since downstream decision-makers may have loss functions that do not necessarily align with prediction accuracy. The paper combines existing work in multi-calibration with work in model multiplicity to solve this problem. The algorithm proposed by the paper seems novel and provides what seems to be sensible theoretical guarantees that trade-off between improvements to prediction

Weaknesses

The paper has weaknesses in its presentation as well as results that seem somewhat suspicious/hard to interpret precisely. The following items could be addressed and improved for presentation: - In the paper's introduction, the authors mention calibration several times, but for someone not immediately familiar with the literature it's hard to understand what it is formally. It becomes a little better defined at Lemma 2.6, but having extra background or explanation in the introduction would be h

Reviewer 03Rating 6Confidence 4

Strengths

This paper is overall a good and novel contribution to the predictive multiplicity literature. Specifically: + The contribution of this paper appears new in the model multiplicity literature: it gives a rigorous method for reconciliation of any two models with provable guarantees in terms of the resulting model loss (for which only one algorithm exists in the literature), while at the same time ensuring in a rigorous way that downstream decision making is not affected negatively (which is new);

Weaknesses

While overall I believe this to be a good-quality paper, there is the following (relatively non-major) consideration that I would call a weakness: - The paper currently appears written with a primarily theoretical audience in mind, but I think it could still do a better/more thorough job coming up with/describing experiments. It currently gives two semi-synthetic ones. In the first one, linear decision losses are generated in a Gaussian manner --- so that the two vision models are essentially b

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsSparse Evolutionary Training