Loading paper
A First-Order Logic-Based Alternative to Reward Models in RLHF | Tomesphere