Distill and Align Decomposition for Enhanced Claim Verification
Jabez Magomere, Elena Kochkina, Samuel Mensah, Simerjot Kaur, Fernando Acero, Arturo Oncevay, Charese H. Smiley, Xiaomo Liu, Manuela Veloso

TL;DR
This paper introduces a reinforcement learning method that jointly optimizes sentence decomposition and claim verification, significantly improving verification accuracy and subclaim quality across multiple settings.
Contribution
It presents a novel RL-based framework with multi-objective optimization for better decomposition and verification, outperforming existing prompt-based and RL methods.
Findings
Decomposer achieves 71.75% macro-F1 in claim verification.
Outperforms prompt-based approaches (+1.99, +6.24) and RL methods (+5.84).
Human evaluation confirms high subclaim quality.
Abstract
Complex claim verification requires decomposing sentences into verifiable subclaims, yet existing methods struggle to align decomposition quality with verification performance. We propose a reinforcement learning (RL) approach that jointly optimizes decomposition quality and verifier alignment using Group Relative Policy Optimization (GRPO). Our method integrates: (i) structured sequential reasoning; (ii) supervised finetuning on teacher-distilled exemplars; and (iii) a multi-objective reward balancing format compliance, verifier alignment, and decomposition quality. Across six evaluation settings, our trained 8B decomposer improves downstream verification performance to (71.75%) macro-F1, outperforming prompt-based approaches ((+1.99), (+6.24)) and existing RL methods ((+5.84)). Human evaluation confirms the high quality of the generated subclaims. Our framework enables smaller…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Hate Speech and Cyberbullying Detection
