LC4-DViT: Land-cover Creation for Land-cover Classification with Deformable Vision Transformer
Kai Wang, Siyi Chen, Weicong Pang, Chenchen Zhang, Renjun Gao, Ziru Chen, Cheng Li, Dasa Gu, Rui Huang, Alexis Kai Hon Lau

TL;DR
This paper introduces LC4-DViT, a deformable vision transformer framework that combines generative data augmentation with advanced modeling to improve high-resolution land-cover classification accuracy.
Contribution
It proposes a novel framework integrating GPT-4o-guided image synthesis and deformation-aware transformers for more accurate and transferable land-cover mapping.
Findings
Achieves over 95% accuracy on eight land-cover classes.
Outperforms baseline ViT and other models like ResNet50 and MobileNetV2.
Demonstrates good transferability across datasets.
Abstract
Land-cover underpins ecosystem services, hydrologic regulation, disaster-risk reduction, and evidence-based land planning; timely, accurate land-cover maps are therefore critical for environmental stewardship. Remote sensing-based land-cover classification offers a scalable route to such maps but is hindered by scarce and imbalanced annotations and by geometric distortions in high-resolution scenes. We propose LC4-DViT (Land-cover Creation for Land-cover Classification with Deformable Vision Transformer), a framework that combines generative data creation with a deformation-aware Vision Transformer. A text-guided diffusion pipeline uses GPT-4o-generated scene descriptions and super-resolved exemplars to synthesize class-balanced, high-fidelity training images, while DViT couples a DCNv4 deformable convolutional backbone with a Vision Transformer encoder to jointly capture fine-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
