SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Jitesh Jain; Jialuo Li; Zixian Ma; Jieyu Zhang; Chris Dongjoo Kim; Sangho Lee; Rohun Tripathi; Tanmay Gupta; Christopher Clark; Humphrey Shi

arXiv:2512.13874·cs.CV·March 31, 2026

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Jitesh Jain, Jialuo Li, Zixian Ma, Jieyu Zhang, Chris Dongjoo Kim, Sangho Lee, Rohun Tripathi, Tanmay Gupta, Christopher Clark, Humphrey Shi

PDF

1 Repo

TL;DR

SAGE is a reinforcement learning-based system enabling flexible, multi-turn reasoning over long videos, significantly improving performance on long-video reasoning tasks.

Contribution

The paper introduces SAGE, a novel agent system with synthetic training data and RL post-training, for efficient any-horizon video reasoning.

Findings

01

Up to 6.1% improvement on open-ended reasoning tasks.

02

8.2% boost on videos longer than 10 minutes.

03

Effective RL recipe enhances reasoning ability.

Abstract

As humans, we are natural any-horizon reasoners, i.e., we can decide whether to iteratively skim long videos or watch short ones in full when necessary for a given task. With this in mind, one would expect video reasoning models to reason flexibly across different durations. However, SOTA models are still trained to predict answers in a single turn while processing a large number of frames, akin to watching an entire long video, requiring significant resources. This raises the question: Is it possible to develop performant any-horizon video reasoning systems? Inspired by human behavior, we first propose SAGE, an agent system that performs multi-turn reasoning on long videos while handling simpler problems in a single turn. Secondly, we introduce an easy synthetic data generation pipeline using Gemini-2.5-Flash to train the orchestrator, SAGE-MM, which lies at the core of SAGE. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/SAGE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.