Multimodal Urban Tree Detection from Satellite and Street-Level Imagery via Annotation-Efficient Deep Learning Strategies

In Seon Kim; Ali Moghimi

arXiv:2604.03505·cs.CV·April 7, 2026

Multimodal Urban Tree Detection from Satellite and Street-Level Imagery via Annotation-Efficient Deep Learning Strategies

In Seon Kim, Ali Moghimi

PDF

TL;DR

This paper presents a multimodal deep learning framework combining satellite and street-level imagery for efficient urban tree detection, reducing annotation efforts and improving accuracy across diverse urban environments.

Contribution

Introduces a novel multimodal framework with domain adaptation and hybrid learning strategies for scalable, annotation-efficient urban tree mapping using satellite and street-level data.

Findings

01

Hybrid learning achieved an F1-score of 0.90, a 12% improvement over baseline.

02

Active learning effectively targeted uncertain predictions, improving detection accuracy.

03

Domain adaptation transferred knowledge across regions, reducing annotation needs.

Abstract

Beyond the immediate biophysical benefits, urban trees play a foundational role in environmental sustainability and disaster mitigation. Precise mapping of urban trees is essential for environmental monitoring, post-disaster assessment, and strengthening policy. However, the transition from traditional, labor-intensive field surveys to scalable automated systems remains limited by high annotation costs and poor generalization across diverse urban scenarios. This study introduces a multimodal framework that integrates high-resolution satellite imagery with ground-level Google Street View to enable scalable and detailed urban tree detection under limited-annotation conditions. The framework first leverages satellite imagery to localize tree candidates and then retrieves targeted ground-level views for detailed detection, significantly reducing inefficient street-level sampling. To address…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.