"Birds in the Clouds": Adventures in Data Engineering
N. Cherel (Cornell Tech), J. Reesman (Cornell Tech), A. Sahuguet, (Cornell Tech), T. Auer (Cornell Laboratory of Ornithology), D. Fink (Cornell, Laboratory of Ornithology)

TL;DR
This paper describes how the Cornell Lab of Ornithology transitioned their bird migration data pipeline to the cloud, significantly reducing costs and enabling scalable, efficient data processing for scientific and educational use.
Contribution
The paper presents a practical approach for migrating complex data pipelines to the cloud using open source tools, achieving cost reduction and scalability.
Findings
Operating costs reduced by a factor of 6
Successful deployment of STEM maps on cloud infrastructure
Enhanced scalability of bird migration data processing
Abstract
Leveraging their eBird crowdsourcing project, the Cornell Lab of Ornithology generates sophisticated Spatio-Temporal Exploratory Model (STEM) maps of bird migrations. Such maps are highly relevant for both scientific and educational purposes, but creating them requires advanced modeling techniques that rely on long and potentially expensive computations. In this paper, we share our experience porting the eBird STEM data pipeline from a physical cluster to the cloud, providing a seamless deployment at a lower cost. Using open source tools and cloud "marketplaces", we managed to divide the operating costs by a factor of 6, making it possible to scale our pipeline on a research budget.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Ecology and Vegetation Dynamics Studies · Wildlife Ecology and Conservation
