MAgSeg: Segmentation of Agricultural Landscapes in High-Resolution Satellite Imagery using Multimodal Large Language Models

Piyush Tiwary; Utkarsh Ahuja; Depanshu Sani; Aishwarya Jayagopal; Sagar Gubbi; Subhashini Venugopalan; Alok Talekar; Vaibhav Rajan

arXiv:2605.16179·cs.CV·May 18, 2026

MAgSeg: Segmentation of Agricultural Landscapes in High-Resolution Satellite Imagery using Multimodal Large Language Models

Piyush Tiwary, Utkarsh Ahuja, Depanshu Sani, Aishwarya Jayagopal, Sagar Gubbi, Subhashini Venugopalan, Alok Talekar, Vaibhav Rajan

PDF

TL;DR

MAgSeg is a novel, decoder-free multimodal large language model approach that effectively segments complex agricultural landscapes in high-resolution satellite images, overcoming domain and context limitations.

Contribution

It introduces a new instruction tuning data format and an efficient architecture enabling MLLMs to perform detailed satellite image segmentation without auxiliary decoders.

Findings

01

MAgSeg outperforms state-of-the-art MLLM baselines on datasets from three countries.

02

The approach enables scalable fine-tuning on high-resolution satellite imagery.

03

It effectively maps smallholder agricultural environments in the Global South.

Abstract

Agricultural landscape segmentation in the Global South is challenging as it is characterized by fragmented plots, high intra-class variance, and a scarcity of labeled training data. Recent advances in segmentation have been made by Multimodal Large Language Models (MLLMs). However, current approaches encounter critical context length bottlenecks and a domain alignment gap in understanding satellite features. We address these limitations through MAgSeg, a novel, decoder-free MLLM segmentation approach. MAgSeg is an architecturally efficient approach that enables standard MLLMs to perform segmentation of complex smallholder agricultural landscapes from high-resolution satellite imagery, without requiring auxiliary vision decoders. We introduce a novel instruction tuning data format designed to enable scalable fine-tuning and post-training on high resolution satellite imagery, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.