Addressing the Long-term Impact of ML Decisions via Policy Regret

David Lindner; Hoda Heidari; Andreas Krause

arXiv:2106.01325·cs.LG·June 3, 2021

Addressing the Long-term Impact of ML Decisions via Policy Regret

David Lindner, Hoda Heidari, Andreas Krause

PDF

1 Repo

TL;DR

This paper introduces a new approach to long-term decision-making in machine learning applications affecting communities, using policy regret to account for evolving rewards and demonstrating an algorithm with sub-linear regret that outperforms baselines over time.

Contribution

It proposes a novel framework for addressing long-term impacts of ML decisions through policy regret and provides an algorithm with proven sub-linear regret bounds.

Findings

01

The algorithm achieves sub-linear policy regret over long horizons.

02

Empirical results show the algorithm outperforms baselines in long-term scenarios.

03

The approach effectively captures the evolving nature of rewards in community-based ML decisions.

Abstract

Machine Learning (ML) increasingly informs the allocation of opportunities to individuals and communities in areas such as lending, education, employment, and beyond. Such decisions often impact their subjects' future characteristics and capabilities in an a priori unknown fashion. The decision-maker, therefore, faces exploration-exploitation dilemmas akin to those in multi-armed bandits. Following prior work, we model communities as arms. To capture the long-term effects of ML-based allocation decisions, we study a setting in which the reward from each arm evolves every time the decision-maker pulls that arm. We focus on reward functions that are initially increasing in the number of pulls but may become (and remain) decreasing after a certain point. We argue that an acceptable sequential allocation of opportunities must take an arm's potential for growth into account. We capture these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

david-lindner/single-peaked-bandits
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.