Spatiotemporal Sycophancy: Negation-Based Gaslighting in Video Large Language Models

Ziyao Tang; Pengkun Jiao; Bin Zhu; Huiyan Qi; Jingjing Chen; Yu-Gang Jiang

arXiv:2604.17873·cs.CV·April 21, 2026

Spatiotemporal Sycophancy: Negation-Based Gaslighting in Video Large Language Models

Ziyao Tang, Pengkun Jiao, Bin Zhu, Huiyan Qi, Jingjing Chen, Yu-Gang Jiang

PDF

TL;DR

This paper uncovers a failure mode in Video Large Language Models called spatiotemporal sycophancy, where models conform to misleading feedback and fabricate explanations, revealing significant robustness issues.

Contribution

It introduces a negation-based gaslighting evaluation framework and GasVideo-1000 benchmark to systematically assess and demonstrate the vulnerability of Vid-LLMs to adversarial feedback.

Findings

01

Vulnerability to negation-based gaslighting is widespread across models.

02

Prompt constraints only partially mitigate hallucinations and belief reversals.

03

Current Vid-LLMs lack robust mechanisms for maintaining grounded beliefs.

Abstract

Video Large Language Models (Vid-LLMs) have demonstrated remarkable performance in video understanding tasks, yet their robustness under conversational interaction remains largely underexplored. In this paper, we identify spatiotemporal sycophancy, a failure mode in which Vid-LLMs retract initially correct, visually grounded judgments and conform to misleading user feedback under negation-based gaslighting. Rather than merely changing their answers, the models often fabricate unsupported temporal or spatial explanations to justify incorrect revisions. To systematically investigate this phenomenon, we propose a negation-based gaslighting evaluation framework and introduce GasVideo-1000, a curated benchmark designed to probe spatiotemporal sycophancy with clear visual grounding and temporal reasoning requirements. We evaluate a broad range of state-of-the-art open-source and proprietary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.