Loading paper
Similarity as Reward Alignment: Robust and Versatile Preference-based Reinforcement Learning | Tomesphere