From Pixels to Patches: Pooling Strategies for Earth Embeddings
Isaac Corley, Caleb Robinson, Inbal Becker-Reshef, Juan M. Lavista Ferres

TL;DR
This paper evaluates pooling strategies for geospatial pixel embeddings, demonstrating that richer statistical pooling methods significantly enhance spatial generalization over mean pooling.
Contribution
It introduces EuroSAT-Embed, benchmarks multiple pooling methods, and recommends distributional statistics pooling for better geographic generalization.
Findings
Richer pooling reduces geographic generalization gap by over 50%.
Distributional statistics pooling improves accuracy by up to 6%.
Simple statistics outperform mean pooling across multiple models.
Abstract
Geospatial foundation models increasingly expose pixel-level embedding products that can be downloaded and reused without access to the underlying encoder. In this setting, downstream tasks with patch- or region-level labels require a post-hoc aggregation step that maps dense pixel embeddings to a single representation. The default choice, mean pooling, discards within-patch variability and can underperform under spatial distribution shift. To study this setting, we introduce EuroSAT-Embed: 81,000 embedding GeoTIFFs derived from three foundation models: AlphaEarth, OlmoEarth, and Tessera. Using these fixed embedding products, we benchmark 11 training-free pooling methods and 2 train-set-fitted baselines under both random and geographically disjoint test splits. Richer pooling schemes reduce the geographic generalization gap by over 50% relative to mean pooling and improve accuracy by up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
