Towards Zero-Shot Annotation of the Built Environment with   Vision-Language Models (Vision Paper)

Bin Han; Yiwei Yang; Anat Caspi; Bill Howe

arXiv:2408.00932·cs.CV·August 5, 2024

Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper)

Bin Han, Yiwei Yang, Anat Caspi, Bill Howe

PDF

Open Access

TL;DR

This paper explores using vision-language models with segmentation prompting to automatically annotate diverse urban features from satellite images, aiming to reduce manual effort and improve urban infrastructure data quality.

Contribution

It introduces a novel prompting strategy for vision-language models to better identify esoteric built environment features in satellite imagery.

Findings

01

Zero-shot prompting fails to annotate urban features effectively.

02

Pre-segmentation prompting achieves up to 40% intersection-over-union accuracy.

03

Results suggest potential for scalable automatic urban environment annotation.

Abstract

Equitable urban transportation applications require high-fidelity digital representations of the built environment: not just streets and sidewalks, but bike lanes, marked and unmarked crossings, curb ramps and cuts, obstructions, traffic signals, signage, street markings, potholes, and more. Direct inspections and manual annotations are prohibitively expensive at scale. Conventional machine learning methods require substantial annotated training data for adequate performance. In this paper, we consider vision language models as a mechanism for annotating diverse urban features from satellite images, reducing the dependence on human annotation to produce large training sets. While these models have achieved impressive results in describing common objects in images captured from a human perspective, their training sets are less likely to include strong signals for esoteric features in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage