Online Personalizing White-box LLMs Generation with Neural Bandits
Zekai Chen, Weeden Daniel, Po-yu Chen, Francois Buet-Golfouse

TL;DR
This paper presents an online neural bandit approach to personalize open-ended text generation in white-box LLMs, significantly improving performance by dynamically adapting to user feedback without retraining models.
Contribution
It introduces a novel neural bandit-based method for real-time personalization of LLM outputs, outperforming baseline strategies in various tasks.
Findings
Up to 62.9% improvement in ROUGE scores for news headlines
Achieved 2.76% higher LLM-agent evaluation scores
Demonstrated effective online adaptation in personalization tasks
Abstract
The advent of personalized content generation by LLMs presents a novel challenge: how to efficiently adapt text to meet individual preferences without the unsustainable demand of creating a unique model for each user. This study introduces an innovative online method that employs neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback, enhancing the personalization of open-ended text generation by white-box LLMs. Through rigorous experimentation on various tasks, we demonstrate significant performance improvements over baseline strategies. NeuralTS, in particular, leads to substantial enhancements in personalized news headline generation, achieving up to a 62.9% improvement in terms of best ROUGE scores and up to 2.76% increase in LLM-agent evaluation against the baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
