Contextual Linear Optimization with Partial Feedback
Yichun Hu, Nathan Kallus, Xiaojie Mao, Yanchen Wu

TL;DR
This paper develops a unified offline learning framework for contextual linear optimization under partial feedback, providing new regret bounds and practical algorithms for decision-making with incomplete information.
Contribution
It introduces a novel IERM-based approach for CLO with partial feedback, including fast-rate regret bounds and surrogate loss methods for computational tractability.
Findings
Proposed a unified offline learning algorithm for CLO with partial feedback.
Derived fast-rate regret bounds for the IERM framework.
Validated methods through stochastic shortest path experiments on real and simulated data.
Abstract
Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients in the objective and thereby improve decision-making performance. A canonical example is the stochastic shortest path problem with random edge costs (e.g., travel time) and contextual features (e.g., lagged traffic, weather). While existing work on CLO assumes fully observed cost coefficient vectors, in many applications the decision maker observes only partial feedback corresponding to each chosen decision in the history. In this paper, we study both a bandit-feedback setting (e.g., only the overall travel time of each historical path is observed) and a semi-bandit-feedback setting (e.g., travel times of the individual segments on each chosen path are additionally observed). We propose a unified class of offline learning algorithms for CLO with different types of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
