SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications

Yana Hasson; Pauline Luc; Liliane Momeni; Maks Ovsjanikov; Guillaume Le Moing; Alina Kuznetsova; Ira Ktena; Jennifer J. Sun; Skanda Koppula; Dilara Gokay; Joseph Heyward; Etienne Pot; Andrew Zisserman

arXiv:2507.03578·cs.CV·July 8, 2025

SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications

Yana Hasson, Pauline Luc, Liliane Momeni, Maks Ovsjanikov, Guillaume Le Moing, Alina Kuznetsova, Ira Ktena, Jennifer J. Sun, Skanda Koppula, Dilara Gokay, Joseph Heyward, Etienne Pot, Andrew Zisserman

PDF

TL;DR

This paper introduces SciVid, a benchmark for evaluating video foundation models across scientific domains, demonstrating their transferability and identifying limitations for future development.

Contribution

It presents SciVid, a new benchmark with five scientific video tasks, and evaluates six ViFMs, showing their potential and limitations in scientific applications.

Findings

01

State-of-the-art results achieved in several scientific tasks using ViFMs

02

Transfer learning from large-scale data is effective across disciplines

03

Limitations of current ViFMs highlight need for more generalizable models

Abstract

In recent years, there has been a proliferation of spatiotemporal foundation models in different scientific disciplines. While promising, these models are often domain-specific and are only assessed within the particular applications for which they are designed. Given that many tasks can be represented as video modeling problems, video foundation models (ViFMs) hold considerable promise as general-purpose domain-agnostic approaches. However, it is not known whether the knowledge acquired on large-scale but potentially out-of-domain data can be effectively transferred across diverse scientific disciplines, and if a single, pretrained ViFM can be competitive with domain-specific baselines. To address this, we introduce SciVid, a comprehensive benchmark comprising five *Sci*entific *Vid*eo tasks, across medical computer vision, animal behavior, and weather forecasting. We adapt six leading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.