SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio

Erik Tegler; Magnus Oskarsson; Kalle {\AA}str\"om

arXiv:2411.13179·cs.SD·November 21, 2024

SONNET: Enhancing Time Delay Estimation by Leveraging Simulated Audio

Erik Tegler, Magnus Oskarsson, Kalle {\AA}str\"om

PDF

Open Access

TL;DR

SONNET is a neural network model trained on simulated audio data that significantly improves time delay estimation accuracy over classical methods like GCC-PHAT, enabling real-time applications and better self-calibration.

Contribution

This paper introduces SONNET, a learning-based time delay estimation model trained on synthetic data that outperforms classical methods on real-world data without re-training.

Findings

01

SONNET outperforms GCC-PHAT on real-world data.

02

The model enables real-time processing.

03

Improved self-calibration performance.

Abstract

Time delay estimation or Time-Difference-Of-Arrival estimates is a critical component for multiple localization applications such as multilateration, direction of arrival, and self-calibration. The task is to estimate the time difference between a signal arriving at two different sensors. For the audio sensor modality, most current systems are based on classical methods such as the Generalized Cross-Correlation Phase Transform (GCC-PHAT) method. In this paper we demonstrate that learning based methods can, even based on synthetic data, significantly outperform GCC-PHAT on novel real world data. To overcome the lack of data with ground truth for the task, we train our model on a simulated dataset which is sufficiently large and varied, and that captures the relevant characteristics of the real world problem. We provide our trained model, SONNET (Simulation Optimized Neural Network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Music Technology and Sound Studies