BlabberSeg: Real-Time Embedded Open-Vocabulary Aerial Segmentation

Haechan Mark Bong; Ricardo de Azambuja; Giovanni Beltrame

arXiv:2410.12979·cs.RO·October 18, 2024

BlabberSeg: Real-Time Embedded Open-Vocabulary Aerial Segmentation

Haechan Mark Bong, Ricardo de Azambuja, Giovanni Beltrame

PDF

Open Access

TL;DR

BlabberSeg is a highly efficient, real-time open-vocabulary aerial image segmentation model optimized for UAVs, significantly reducing computational costs while maintaining accuracy, enabling practical deployment in UAV environmental perception tasks.

Contribution

We developed BlabberSeg, an optimized vision-language model based on CLIPSeg, tailored for real-time aerial segmentation on embedded UAV platforms, with substantial efficiency improvements.

Findings

01

Achieves 927.41% speed increase over CLIPSeg on NVIDIA Jetson Orin AGX.

02

Maintains 97.9% of CLIPSeg's segmentation accuracy.

03

Enables real-time open-vocabulary aerial segmentation in UAV applications.

Abstract

Real-time aerial image segmentation plays an important role in the environmental perception of Uncrewed Aerial Vehicles (UAVs). We introduce BlabberSeg, an optimized Vision-Language Model built on CLIPSeg for on-board, real-time processing of aerial images by UAVs. BlabberSeg improves the efficiency of CLIPSeg by reusing prompt and model features, reducing computational overhead while achieving real-time open-vocabulary aerial segmentation. We validated BlabberSeg in a safe landing scenario using the Dynamic Open-Vocabulary Enhanced SafE-Landing with Intelligence (DOVESEI) framework, which uses visual servoing and open-vocabulary segmentation. BlabberSeg reduces computational costs significantly, with a speed increase of 927.41% (16.78 Hz) on a NVIDIA Jetson Orin AGX (64GB) compared with the original CLIPSeg (1.81Hz), achieving real-time aerial segmentation with negligible loss in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization