Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

Jongseo Lee; Hyuntak Lee; Sunghun Kim; Sooa Kim; Jihoon Chung; Jinwoo Choi

arXiv:2605.22823·cs.CV·May 22, 2026

Which Way Did It Move? Diagnosing and Overcoming Directional Motion Blindness in Video-LLMs

Jongseo Lee, Hyuntak Lee, Sunghun Kim, Sooa Kim, Jihoon Chung, Jinwoo Choi

PDF

1 Repo

TL;DR

This paper identifies a fundamental failure in Video-LLMs called directional motion blindness, and introduces datasets and a training objective to improve their ability to understand and predict motion directions in videos.

Contribution

The authors diagnose the cause of motion direction understanding failure in Video-LLMs and propose MoDirect datasets and DeltaDirect training to enhance directional perception.

Findings

01

Instruction tuning with DeltaDirect significantly improves motion direction accuracy.

02

DeltaDirect enhances real-world motion understanding without harming general video tasks.

03

Motion direction signals are present but not properly bound in Video-LLMs.

Abstract

Video Large Language Models (Video-LLMs) have made rapid progress on temporal video understanding, yet many fail at a basic perceptual primitive: signed image-plane motion direction. On simple videos of a single object moving left, right, up, or down, most Video-LLMs perform near chance, with above-chance cases largely attributable to prediction biases rather than genuine direction understanding. We call this failure directional motion blindness. We localize the failure by tracing motion direction information through the Video-LLM pipeline. Motion direction remains linearly accessible from the vision encoder, projector, and LLM hidden states, but the readout fails to bind this signal to the correct verbal answer option, revealing a direction binding gap. Although synthetic motion direction instruction tuning reduces this gap on the source domain, motion direction concept vector analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KHU-VLL/DeltaDirect
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.