Survey: Multi-Armed Bandits Meet Large Language Models
Djallel Bouneffouf, Raphael Feraud

TL;DR
This survey reviews how bandit algorithms and large language models can mutually enhance each other, highlighting recent research, challenges, and future opportunities in integrating these AI techniques for improved decision-making and natural language processing.
Contribution
It provides a comprehensive overview of the intersection between bandit algorithms and large language models, identifying key methods, challenges, and potential for future research.
Findings
Bandit algorithms improve LLM fine-tuning and prompt engineering.
LLMs enhance bandit decision-making with natural language reasoning.
The survey highlights open challenges and future research directions.
Abstract
Bandit algorithms and Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence, each addressing distinct yet complementary challenges in decision-making and natural language processing. This survey explores the synergistic potential between these two fields, highlighting how bandit algorithms can enhance the performance of LLMs and how LLMs, in turn, can provide novel insights for improving bandit-based decision-making. We first examine the role of bandit algorithms in optimizing LLM fine-tuning, prompt engineering, and adaptive response generation, focusing on their ability to balance exploration and exploitation in large-scale learning tasks. Subsequently, we explore how LLMs can augment bandit algorithms through advanced contextual understanding, dynamic adaptation, and improved policy selection using natural language reasoning. By providing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
