VistaFormer: Scalable Vision Transformers for Satellite Image Time Series Segmentation
Ezra MacDonald, Derek Jacoby, and Yvonne Coady

TL;DR
VistaFormer is a scalable, lightweight Transformer model for satellite image segmentation that improves accuracy and efficiency by using multi-scale encoding, position-free self-attention, and neighborhood attention techniques.
Contribution
The paper introduces VistaFormer, a novel Transformer-based architecture that enhances satellite image segmentation with reduced computational cost and improved performance over existing models.
Findings
VistaFormer outperforms comparable models on PASTIS and MTLCC benchmarks.
It achieves similar or better accuracy with significantly fewer floating point operations.
Replacing MHSA with NA further improves scalability and efficiency.
Abstract
We introduce VistaFormer, a lightweight Transformer-based model architecture for the semantic segmentation of remote-sensing images. This model uses a multi-scale Transformer-based encoder with a lightweight decoder that aggregates global and local attention captured in the encoder blocks. VistaFormer uses position-free self-attention layers which simplifies the model architecture and removes the need to interpolate temporal and spatial codes, which can reduce model performance when training and testing image resolutions differ. We investigate simple techniques for filtering noisy input signals like clouds and demonstrate that improved model scalability can be achieved by substituting Multi-Head Self-Attention (MHSA) with Neighbourhood Attention (NA). Experiments on the PASTIS and MTLCC crop-type segmentation benchmarks show that VistaFormer achieves better performance than comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSatellite Image Processing and Photogrammetry · Astronomical Observations and Instrumentation
MethodsSoftmax · Attention Is All You Need
