Reinforce to Learn, Elect to Reason: A Dual Paradigm for Video Reasoning

Songyuan Yang; Weijiang Yu; Jilin Ma; Ziyu Liu; Guijian Tang; Wenjing Yang; Huibin Tan; Nong Xiao

arXiv:2604.04379·cs.CV·April 7, 2026

Reinforce to Learn, Elect to Reason: A Dual Paradigm for Video Reasoning

Songyuan Yang, Weijiang Yu, Jilin Ma, Ziyu Liu, Guijian Tang, Wenjing Yang, Huibin Tan, Nong Xiao

PDF

TL;DR

RLER introduces a dual paradigm for video reasoning that explicitly incorporates evidence during learning and inference, significantly improving reliability and interpretability of large multimodal models.

Contribution

The paper proposes RLER, a novel framework that decouples evidence generation from answer inference using reinforcement learning and evidence-based election, achieving state-of-the-art results.

Findings

01

RLER outperforms existing models on 8 benchmarks with an average of 6.3% improvement.

02

The approach uses an average of 3.1 candidates per question, balancing compute and quality.

03

Explicit evidence modeling enhances trustworthiness and interpretability in video reasoning.

Abstract

Video reasoning has advanced with large multimodal models (LMMs), yet their inference is often a single pass that returns an answer without verifying whether the reasoning is evidence-aligned. We introduce Reinforce to Learn, Elect to Reason (RLER), a dual paradigm that decouples learning to produce evidence from obtaining a reliable answer. In RLER-Training, we optimize the policy with group-relative reinforcement learning (RL) and 3 novel task-driven rewards: Frame-sensitive reward grounds reasoning on explicit key frames, Think-transparency reward shapes readable and parsable reasoning traces, and Anti-repetition reward boosts information density. These signals teach the model to emit structured, machine-checkable evidence and potentiate reasoning capabilities. In RLER-Inference, we apply a train-free orchestrator that generates a small set of diverse candidates, parses their answers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.