V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge in Vision Language Models
Seyed Mahed Mousavi, Christian Moiola, Massimo Rizzoli, Simone Alghisi, Giuseppe Riccardi

TL;DR
V-DyKnow introduces a benchmark to evaluate how well vision-language models handle time-sensitive facts, revealing their tendency to produce outdated information and challenges in updating knowledge across modalities.
Contribution
The paper presents V-DyKnow, a new dynamic benchmark for assessing time-sensitive knowledge in vision-language models, including analysis of their knowledge reliability and update capabilities.
Findings
VLMs often output outdated facts reflecting their training data snapshots.
Reliability decreases from textual to visual stimuli, even with correct entity recognition.
Existing methods struggle to effectively update models' knowledge across modalities.
Abstract
Vision-Language Models (VLMs) are trained on data snapshots of documents, including images and texts. Their training data and evaluation benchmarks are typically static, implicitly treating factual knowledge as time-invariant. However, real-world facts are intrinsically time-sensitive and subject to erratic and periodic changes, causing model predictions to become outdated. We present V-DyKnow, a Visual Dynamic Knowledge benchmark for evaluating time-sensitive factual knowledge in VLMs. Using V-DyKnow, we benchmark closed- and open-source VLMs and analyze a) the reliability (correctness and consistency) of model responses across modalities and input perturbations; b) the efficacy of knowledge editing and multi-modal RAG methods for knowledge updates across modalities; and c) the sources of outdated predictions, through data and mechanistic analysis. Our results show that VLMs frequently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling
