VistaFormer: Scalable Vision Transformers for Satellite Image Time   Series Segmentation

Ezra MacDonald; Derek Jacoby; and Yvonne Coady

arXiv:2409.08461·cs.CV·September 16, 2024·2 cites

VistaFormer: Scalable Vision Transformers for Satellite Image Time Series Segmentation

Ezra MacDonald, Derek Jacoby, and Yvonne Coady

PDF

Open Access 1 Repo

TL;DR

VistaFormer is a scalable, lightweight Transformer model for satellite image segmentation that improves accuracy and efficiency by using multi-scale encoding, position-free self-attention, and neighborhood attention techniques.

Contribution

The paper introduces VistaFormer, a novel Transformer-based architecture that enhances satellite image segmentation with reduced computational cost and improved performance over existing models.

Findings

01

VistaFormer outperforms comparable models on PASTIS and MTLCC benchmarks.

02

It achieves similar or better accuracy with significantly fewer floating point operations.

03

Replacing MHSA with NA further improves scalability and efficiency.

Abstract

We introduce VistaFormer, a lightweight Transformer-based model architecture for the semantic segmentation of remote-sensing images. This model uses a multi-scale Transformer-based encoder with a lightweight decoder that aggregates global and local attention captured in the encoder blocks. VistaFormer uses position-free self-attention layers which simplifies the model architecture and removes the need to interpolate temporal and spatial codes, which can reduce model performance when training and testing image resolutions differ. We investigate simple techniques for filtering noisy input signals like clouds and demonstrate that improved model scalability can be achieved by substituting Multi-Head Self-Attention (MHSA) with Neighbourhood Attention (NA). Experiments on the PASTIS and MTLCC crop-type segmentation benchmarks show that VistaFormer achieves better performance than comparable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

macdonaldezra/VistaFormer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSatellite Image Processing and Photogrammetry · Astronomical Observations and Instrumentation

MethodsSoftmax · Attention Is All You Need