Loading paper
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization | Tomesphere