Generalized Referring Expression Segmentation on Aerial Photos
Lu\'is Marnoto, Alexandre Bernardino, Bruno Martins

TL;DR
This paper introduces Aerial-D, a large-scale dataset for referring expression segmentation in aerial imagery, and demonstrates a unified model that performs well across diverse conditions including historical and degraded images.
Contribution
The work presents a new dataset, Aerial-D, and adapts the RSRefSeg architecture for unified segmentation from text in aerial images, including historical and degraded data.
Findings
Achieved competitive performance on existing benchmarks.
Maintained high accuracy on monochrome, sepia, and grainy images.
Demonstrated the effectiveness of combined training on modern and historical aerial datasets.
Abstract
Referring expression segmentation is a fundamental task in computer vision that integrates natural language understanding with precise visual localization of target regions. Considering aerial imagery (e.g., modern aerial photos collected through drones, historical photos from aerial archives, high-resolution satellite imagery, etc.) presents unique challenges because spatial resolution varies widely across datasets, the use of color is not consistent, targets often shrink to only a few pixels, and scenes contain very high object densities and objects with partial occlusions. This work presents Aerial-D, a new large-scale referring expression segmentation dataset for aerial imagery, comprising 37,288 images with 1,522,523 referring expressions that cover 259,709 annotated targets, spanning across individual object instances, groups of instances, and semantic regions covering 21 distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Hand Gesture Recognition Systems
