Loading paper
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment | Tomesphere