TL;DR
RioRAG introduces a verifiable reward framework for long-form retrieval-augmented generation, improving factual accuracy and stability without relying on handcrafted supervision or strong teacher models.
Contribution
It defines a measurable informativeness objective and employs nugget-centric verification to stabilize RL training in LFQA tasks.
Findings
RioRAG achieves higher factual recall and faithfulness.
It stabilizes optimization by dense, verifiable rewards.
The framework outperforms existing methods on LongFact and RAGChecker.
Abstract
Long-form question answering (LFQA) requires open-ended long-form responses that synthesize coherent, factually grounded content from multi-source evidence. This makes reinforcement learning (RL) reward design critical. The reward must be verifiable for faithful grounding and stable optimization. However, many standard rewards assume a unique target with an exact-match notion of correctness, which fits short-form QA and math but breaks in LFQA. As a result, current RAG systems still lack verifiable reward mechanisms, yielding unstable feedback signals and suboptimal optimization outcomes. We propose RioRAG, a framework for reinforced verifiable informativeness optimization. First, it defines informativeness as a measurable and externally verifiable objective for RL. Second, RioRAG uses nugget-centric verification with cross-source checks to enable self-evolution of smaller LLMs and to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
