A Data-Centric Vision Transformer Baseline for SAR Sea Ice Classification
David Mike-Ewewie, Panhapiseth Lim, Priyanka Kumar

TL;DR
This paper establishes a SAR-only baseline using Vision Transformers for Arctic sea ice classification, demonstrating the effectiveness of focal loss in handling class imbalance and setting a foundation for future multimodal fusion.
Contribution
It introduces a Vision Transformer baseline trained on the AI4Arctic/ASIP dataset with specific techniques, providing a trustworthy benchmark for future multimodal approaches.
Findings
ViT-Large with focal loss achieves 69.6% accuracy on held-out data.
Focal loss improves precision-recall trade-off for rare ice classes.
The baseline offers a cleaner reference for future multimodal fusion research.
Abstract
Accurate and automated sea ice classification is important for climate monitoring and maritime safety in the Arctic. While Synthetic Aperture Radar (SAR) is the operational standard because of its all-weather capability, it remains challenging to distinguish morphologically similar ice classes under severe class imbalance. Rather than claiming a fully validated multimodal system, this paper establishes a trustworthy SAR only baseline that future fusion work can build upon. Using the AI4Arctic/ASIP Sea Ice Dataset (v2), which contains 461 Sentinel-1 scenes matched with expert ice charts, we combine full-resolution Sentinel-1 Extra Wide inputs, leakage-aware stratified patch splitting, SIGRID-3 stage-of-development labels, and training-set normalization to evaluate Vision Transformer baselines. We compare ViT-Base models trained with cross entropy and weighted cross-entropy against a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
