Loading paper
Answer First, Reason Later: Aligning Search Relevance via Mode-Balanced Reinforcement Learning | Tomesphere