AV-DTEC: Self-Supervised Audio-Visual Fusion for Drone Trajectory   Estimation and Classification

Zhenyuan Xiao; Yizhuo Yang; Guili Xu; Xianglong Zeng; Shenghai Yuan

arXiv:2412.16928·cs.SD·December 24, 2024·2 cites

AV-DTEC: Self-Supervised Audio-Visual Fusion for Drone Trajectory Estimation and Classification

Zhenyuan Xiao, Yizhuo Yang, Guili Xu, Xianglong Zeng, Shenghai Yuan

PDF

Open Access 1 Repo

TL;DR

AV-DTEC introduces a lightweight self-supervised audio-visual fusion system for drone detection, enhancing robustness and accuracy in real-world conditions by integrating multi-modal features and adaptive weighting.

Contribution

It presents a novel self-supervised learning framework with a plug-and-play feature enhancement module and a teacher-student model for improved drone detection.

Findings

01

High accuracy in real-world multi-modality data

02

Effective cross-lighting robustness

03

Open-source code and models available

Abstract

The increasing use of compact UAVs has created significant threats to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we propose AV-DTEC, a lightweight self-supervised audio-visual fusion-based anti-UAV system. AV-DTEC is trained using self-supervised learning with labels generated by LiDAR, and it simultaneously learns audio and visual features through a parallel selective state-space model. With the learned features, a specially designed plug-and-play primary-auxiliary feature enhancement module integrates visual features into audio features for better robustness in cross-lighting conditions. To reduce reliance on auxiliary features and align modalities, we propose a teacher-student model that adaptively adjusts the weighting of visual features. AV-DTEC demonstrates exceptional accuracy and effectiveness in real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazingday1/av-detc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods

MethodsALIGN