SatBLIP: Context Understanding and Feature Identification from Satellite Imagery with Vision-Language Learning
Xue Wu, Shengting Cao, Shenglin Li, Jiaqi Gong

TL;DR
SatBLIP is a novel vision-language framework that enhances rural environmental risk assessment from satellite imagery by combining contrastive alignment, tailored captioning, and interpretability techniques.
Contribution
It introduces a satellite-specific vision-language model that improves feature identification and interpretability for rural risk analysis, addressing limitations of prior methods.
Findings
Accurately predicts county-level Social Vulnerability Index (SVI) from satellite images.
Identifies key features like roof condition, street width, and vegetation that influence risk predictions.
Enables interpretable mapping of rural environmental risks.
Abstract
Rural environmental risks are shaped by place-based conditions (e.g., housing quality, road access, land-surface patterns), yet standard vulnerability indices are coarse and provide limited insight into risk contexts. We propose SatBLIP, a satellite-specific vision-language framework for rural context understanding and feature identification that predicts county-level Social Vulnerability Index (SVI). SatBLIP addresses limitations of prior remote sensing pipelines-handcrafted features, manual virtual audits, and natural-image-trained VLMs-by coupling contrastive image-text alignment with bootstrapped captioning tailored to satellite semantics. We use GPT-4o to generate structured descriptions of satellite tiles (roof type/condition, house size, yard attributes, greenery, and road context), then fine-tune a satellite-adapted BLIP model to generate captions for unseen images. Captions are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
