BioCube: A Multimodal Dataset for Biodiversity Research
Stylianos Stasinos, Martino Mensio, Elena Lazovik, Athanasios Trantas

TL;DR
BioCube is a comprehensive, multimodal global dataset designed to advance biodiversity research by providing detailed, geospatially aligned ecological data from 2000 to 2020 for machine learning applications.
Contribution
The paper introduces BioCube, a novel, large-scale, multimodal biodiversity dataset with high spatial and temporal resolution, enabling improved ecological modeling and analysis.
Findings
Dataset includes images, audio, DNA, climate, and land indicators.
Spans 20 years with global coverage.
Available for public use at Hugging Face.
Abstract
Biodiversity research requires complete and detailed information to study ecosystem dynamics at different scales. Employing data-driven methods like Machine Learning is getting traction in ecology and more specific biodiversity, offering alternative modelling pathways. For these methods to deliver accurate results there is the need for large, curated and multimodal datasets that offer granular spatial and temporal resolutions. In this work, we introduce BioCube, a multimodal, fine-grained global dataset for ecology and biodiversity research. BioCube incorporates species observations through images, audio recordings and descriptions, environmental DNA, vegetation indices, agricultural, forest, land indicators, and high-resolution climate variables. All observations are geospatially aligned under the WGS84 geodetic system, spanning from 2000 to 2020. The dataset is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBalanced Selection
