TL;DR
This paper introduces a UCB algorithm for online fair division with bandit learning, achieving near-optimal regret bounds while satisfying proportionality constraints in expectation.
Contribution
It presents the first UCB-based method for online fair division with unknown values, improving regret bounds from O(T^{2/3}) to O(\u221a{T}).
Findings
Achieves O((T)) regret with high probability.
Guarantees proportionality in expectation under unknown value distributions.
Introduces a two-round linear optimization UCB algorithm for this setting.
Abstract
We study online fair division when there are a finite number of item types and the player values for the items are drawn randomly from distributions with unknown means. In this setting, a sequence of indivisible items arrives according to a random online process, and each item must be allocated to a single player. The goal is to maximize expected social welfare while maintaining that the allocation satisfies proportionality in expectation. When player values are normalized, we show that it is possible to with high probability guarantee proportionality constraint satisfaction and achieve regret. To achieve this result, we present an upper confidence bound (UCB) algorithm that uses two rounds of linear optimization. This algorithm highlights fundamental aspects of proportionality constraints that allow for a UCB algorithm despite the presence of many (potentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
