LC4-DViT: Land-cover Creation for Land-cover Classification with Deformable Vision Transformer

Kai Wang; Siyi Chen; Weicong Pang; Chenchen Zhang; Renjun Gao; Ziru Chen; Cheng Li; Dasa Gu; Rui Huang; Alexis Kai Hon Lau

arXiv:2511.22812·cs.CV·May 8, 2026

LC4-DViT: Land-cover Creation for Land-cover Classification with Deformable Vision Transformer

Kai Wang, Siyi Chen, Weicong Pang, Chenchen Zhang, Renjun Gao, Ziru Chen, Cheng Li, Dasa Gu, Rui Huang, Alexis Kai Hon Lau

PDF

TL;DR

This paper introduces LC4-DViT, a deformable vision transformer framework that combines generative data augmentation with advanced modeling to improve high-resolution land-cover classification accuracy.

Contribution

It proposes a novel framework integrating GPT-4o-guided image synthesis and deformation-aware transformers for more accurate and transferable land-cover mapping.

Findings

01

Achieves over 95% accuracy on eight land-cover classes.

02

Outperforms baseline ViT and other models like ResNet50 and MobileNetV2.

03

Demonstrates good transferability across datasets.

Abstract

Land-cover underpins ecosystem services, hydrologic regulation, disaster-risk reduction, and evidence-based land planning; timely, accurate land-cover maps are therefore critical for environmental stewardship. Remote sensing-based land-cover classification offers a scalable route to such maps but is hindered by scarce and imbalanced annotations and by geometric distortions in high-resolution scenes. We propose LC4-DViT (Land-cover Creation for Land-cover Classification with Deformable Vision Transformer), a framework that combines generative data creation with a deformation-aware Vision Transformer. A text-guided diffusion pipeline uses GPT-4o-generated scene descriptions and super-resolved exemplars to synthesize class-balanced, high-fidelity training images, while DViT couples a DCNv4 deformable convolutional backbone with a Vision Transformer encoder to jointly capture fine-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.