Loading paper
Debate as Reward: A Multi-Agent Reward System for Scientific Ideation via RL Post-Training | Tomesphere