Loading paper
CPO: Addressing Reward Ambiguity in Role-playing Dialogue via Comparative Policy Optimization | Tomesphere