Spatio-Temporal Foundation Models: Vision, Challenges, and Opportunities
Adam Goodge, Wee Siong Ng, Bryan Hooi, See Kiong Ng

TL;DR
This paper discusses the potential and challenges of developing spatio-temporal foundation models (STFMs) for vision tasks, emphasizing their importance in critical domains and outlining future research directions.
Contribution
It provides a comprehensive vision, critical assessment, and identification of research gaps and opportunities for advancing STFMs in vision and related fields.
Findings
Current STFMs lack generalization capabilities.
Significant research gaps exist in spatio-temporal modeling.
Future directions include addressing key challenges for broad applicability.
Abstract
Foundation models have revolutionized artificial intelligence, setting new benchmarks in performance and enabling transformative capabilities across a wide range of vision and language tasks. However, despite the prevalence of spatio-temporal data in critical domains such as transportation, public health, and environmental monitoring, spatio-temporal foundation models (STFMs) have not yet achieved comparable success. In this paper, we articulate a vision for the future of STFMs, outlining their essential characteristics and the generalization capabilities necessary for broad applicability. We critically assess the current state of research, identifying gaps relative to these ideal traits, and highlight key challenges that impede their progress. Finally, we explore potential opportunities and directions to advance research towards the aim of effective and broadly applicable STFMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · Geographic Information Systems Studies
