Single-Channel Distance-Based Source Separation for Mobile GPU in Outdoor and Indoor Environments
Hanbin Bae, Byungjun Kang, Jiwon Kim, Jaeyong Hwang, Hosang Sung,, Hoon-Young Cho

TL;DR
This paper introduces a novel distance-based source separation model optimized for mobile GPUs, specifically targeting outdoor environments, and demonstrates significant improvements in energy efficiency and real-time processing.
Contribution
The study presents a new outdoor-focused DSS model using advanced techniques like conformer blocks and linear RSA, optimized for mobile GPU deployment.
Findings
Enhanced separation performance in outdoor environments.
Improved energy efficiency on mobile devices.
Real-time inference capability achieved.
Abstract
This study emphasizes the significance of exploring distance-based source separation (DSS) in outdoor environments. Unlike existing studies that primarily focus on indoor settings, the proposed model is designed to capture the unique characteristics of outdoor audio sources. It incorporates advanced techniques, including a two-stage conformer block, a linear relation-aware self-attention (RSA), and a TensorFlow Lite GPU delegate. While the linear RSA may not capture physical cues as explicitly as the quadratic RSA, the linear RSA enhances the model's context awareness, leading to improved performance on the DSS that requires an understanding of physical cues in outdoor and indoor environments. The experimental results demonstrated that the proposed model overcomes the limitations of existing approaches and considerably enhances energy efficiency and real-time inference speed on mobile…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Underwater Vehicles and Communication Systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus
