Habitat Classification from Ground-Level Imagery Using Deep Neural Networks
Hongrui Shi, Lisa Norton, Lucy Ridding, Simon Rolph, Tom August, Claire M Wood, Lan Qie, Petra Bosilj, James M Brown

TL;DR
This study applies deep neural networks, especially vision transformers, to classify habitats from ground-level imagery, achieving expert-level accuracy and offering a scalable tool for biodiversity conservation.
Contribution
It demonstrates that vision transformers outperform CNNs in habitat classification from ground images and that supervised contrastive learning enhances discriminative capabilities.
Findings
Vision transformers achieve 91% Top-3 accuracy.
Supervised contrastive learning reduces misclassification among similar habitats.
Model performance matches ecological experts in habitat classification.
Abstract
Habitat assessment at local scales -- critical for enhancing biodiversity and guiding conservation priorities -- often relies on expert field surveys that can be costly, motivating the exploration of AI-driven tools to automate and refine this process. While most AI-driven habitat mapping depends on remote sensing, it is often constrained by sensor availability, weather, and coarse resolution. In contrast, ground-level imagery captures essential structural and compositional cues invisible from above and remains underexplored for robust, fine-grained habitat classification. This study addresses this gap by applying state-of-the-art deep neural network architectures to ground-level habitat imagery. Leveraging data from the UK Countryside Survey covering 18 broad habitat types, we evaluate two families of models - convolutional neural networks (CNNs) and vision transformers (ViTs) - under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
