Dynamic Policy Fusion for User Alignment Without Re-Interaction
Ajsal Shereef Palattuparambil, Thommen George Karimpanal, Santu Rana

TL;DR
This paper introduces a zero-shot method for adapting pre-trained reinforcement learning policies to individual user preferences by inferring user intent from trajectory feedback and fusing it dynamically, avoiding retraining or extra environment interactions.
Contribution
It presents a theoretically grounded dynamic policy fusion technique that aligns RL policies with user preferences using human feedback without additional environment interactions.
Findings
Consistently achieves task goals while respecting user preferences.
Operates without additional environment interactions, enabling zero-shot adaptation.
Effective across multiple environments.
Abstract
Deep reinforcement learning (RL) policies, although optimal in terms of task rewards, may not align with the personal preferences of human users. To ensure this alignment, a naive solution would be to retrain the agent using a reward function that encodes the user's specific preferences. However, such a reward function is typically not readily available, and as such, retraining the agent from scratch can be prohibitively expensive. We propose a more practical approach - to adapt the already trained policy to user-specific needs with the help of human feedback. To this end, we infer the user's intent through trajectory-level feedback and combine it with the trained task policy via a theoretically grounded dynamic policy fusion approach. As our approach collects human feedback on the very same trajectories used to learn the task policy, it does not require any additional interactions with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust · Healthcare innovation and challenges
MethodsALIGN
