Multimodal Urban Tree Detection from Satellite and Street-Level Imagery via Annotation-Efficient Deep Learning Strategies
In Seon Kim, Ali Moghimi

TL;DR
This paper presents a multimodal deep learning framework combining satellite and street-level imagery for efficient urban tree detection, reducing annotation efforts and improving accuracy across diverse urban environments.
Contribution
Introduces a novel multimodal framework with domain adaptation and hybrid learning strategies for scalable, annotation-efficient urban tree mapping using satellite and street-level data.
Findings
Hybrid learning achieved an F1-score of 0.90, a 12% improvement over baseline.
Active learning effectively targeted uncertain predictions, improving detection accuracy.
Domain adaptation transferred knowledge across regions, reducing annotation needs.
Abstract
Beyond the immediate biophysical benefits, urban trees play a foundational role in environmental sustainability and disaster mitigation. Precise mapping of urban trees is essential for environmental monitoring, post-disaster assessment, and strengthening policy. However, the transition from traditional, labor-intensive field surveys to scalable automated systems remains limited by high annotation costs and poor generalization across diverse urban scenarios. This study introduces a multimodal framework that integrates high-resolution satellite imagery with ground-level Google Street View to enable scalable and detailed urban tree detection under limited-annotation conditions. The framework first leverages satellite imagery to localize tree candidates and then retrieves targeted ground-level views for detailed detection, significantly reducing inefficient street-level sampling. To address…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
