Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding

Shivam Sharma; Sankalp Nagaonkar; Ashish Choithani; Ashutosh Trivedi

arXiv:2604.11177·cs.CV·April 14, 2026

Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding

Shivam Sharma, Sankalp Nagaonkar, Ashish Choithani, Ashutosh Trivedi

PDF

1 Repo

TL;DR

This paper evaluates how internal reasoning traces, called thought streams, influence video scene understanding in Gemini vision-language models, revealing that more reasoning yields diminishing returns and highlighting differences between model versions.

Contribution

It introduces new evaluation metrics for reasoning in vision-language models and analyzes how thought streams impact scene understanding and model behavior.

Findings

01

Quality improvements plateau after a few hundred tokens.

02

Flash Lite balances quality and token efficiency effectively.

03

Models sometimes hallucinate content not reasoned about due to reasoning budget constraints.

Abstract

We benchmark how internal reasoning traces, which we call thought streams, affect video scene understanding in vision-language models. Using four configurations of Google's Gemini 2.5 Flash and Flash Lite across scenes extracted from 100 hours of video, we ask three questions: does more thinking lead to better outputs, where do the gains stop, and what do these models actually think about? We introduce three evaluation metrics. Contentfulness measures how much of the thought stream is useful scene content versus meta-commentary. Thought-Final Coverage measures how faithfully the thought stream translates into the final output. Dominant Entity Analysis identifies which subjects, actions, and settings the model focuses on. GPT-5 serves as an independent judge. We find that quality gains from additional thinking plateau quickly, with most improvement occurring in the first few hundred…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

video-db/gemini-reasoning-eval
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.