Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits
Adam Bayley, Xiaodan Zhu, Raquel Aoki, Yanshuai Cao, Kevin H. Wilson

TL;DR
This paper evaluates the effectiveness of LLM-initialized bandits, revealing that their performance heavily depends on the alignment of LLM-generated preferences with actual user data, especially under noise and misalignment.
Contribution
It provides a theoretical framework and empirical analysis of when LLM-based warm-starts outperform cold-start bandits, considering noise and alignment issues.
Findings
Warm-starting remains effective up to 30% noise corruption.
Performance degrades significantly beyond 40-50% noise.
Systematic misalignment can cause higher regret than cold-starts.
Abstract
The recent advancement of Large Language Models (LLMs) offers new opportunities to generate user preference data to warm-start bandits. Recent studies on contextual bandits with LLM initialization (CBLI) have shown that these synthetic priors can significantly lower early regret. However, these findings assume that LLM-generated choices are reasonably aligned with actual user preferences. In this paper, we systematically examine how LLM-generated preferences perform when random and label-flipping noise is injected into the synthetic training data. For aligned domains, we find that warm-starting remains effective up to 30% corruption, loses its advantage around 40%, and degrades performance beyond 50%. When there is systematic misalignment, even without added noise, LLM-generated priors can lead to higher regret than a cold-start bandit. To explain these behaviors, we develop a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
