Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits
Jiabin Lin, Shana Moothedath, Namrata Vaswani

TL;DR
This paper introduces a multi-task learning algorithm for stochastic contextual bandits that leverages shared low-rank representations to improve learning efficiency, supported by theoretical regret bounds and experimental comparisons.
Contribution
The paper proposes a novel algorithm combining alternating projected gradient descent and a minimization estimator to recover low-rank feature matrices in multi-task contextual bandits.
Findings
The algorithm achieves regret bounds that outperform traditional methods.
Experimental results show improved sample efficiency and performance over benchmarks.
The approach effectively leverages shared representations across multiple bandit tasks.
Abstract
We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Data Stream Mining Techniques
