TreeFormers -- An Exploration of Vision Transformers for Deforestation Driver Classification
Uche Ochuba

TL;DR
This paper explores the use of vision transformers to classify deforestation drivers from satellite images, achieving a 72.9% accuracy and offering insights into model performance and limitations.
Contribution
It introduces a ViT-based approach with data augmentation and longitudinal embedding for deforestation driver classification from satellite imagery.
Findings
Achieved 72.9% test accuracy
Enhanced classification with data augmentation techniques
Identified strengths and limitations of ViT in this context
Abstract
This paper addresses the critical issue of deforestation by exploring the application of vision transformers (ViTs) for classifying the drivers of deforestation using satellite imagery from Indonesian forests. Motivated by the urgency of this problem, I propose an approach that leverages ViTs and machine learning techniques. The input to my algorithm is a 332x332-pixel satellite image, and I employ a ViT architecture to predict the deforestation driver class; grassland shrubland, other, plantation, or smallholder agriculture. My methodology involves fine-tuning a pre-trained ViT on a dataset from the Stanford ML Group, and I experiment with rotational data augmentation techniques (among others) and embedding of longitudinal data to improve classification accuracy. I also tried training a ViT from scratch. Results indicate a significant improvement over baseline models, achieving a test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications
