Loading paper
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning | Tomesphere