Intensity Particle Flow SMC-PHD Filter For Audio Speaker Tracking

Yang Liu; Wenwu Wang; Volkan Kilic

arXiv:1812.01570·cs.SD·December 5, 2018·49 cites

Intensity Particle Flow SMC-PHD Filter For Audio Speaker Tracking

Yang Liu, Wenwu Wang, Volkan Kilic

PDF

Open Access

TL;DR

This paper introduces the IPF-SMC-PHD filter, enhancing multi-speaker tracking by incorporating detection probability and clutter considerations, leading to improved accuracy in acoustic source localization.

Contribution

It proposes a novel IPF-SMC-PHD filter that accounts for detection probability and clutter, addressing limitations of previous NPF-SMC-PHD methods.

Findings

01

Improved tracking accuracy on LOCATA dataset

02

Effective handling of missing detections and clutter

03

No data association needed for particle flow calculation

Abstract

Non-zero diffusion particle flow Sequential Monte Carlo probability hypothesis density (NPF-SMC-PHD) filtering has been recently introduced for multi-speaker tracking. However, the NPF does not consider the missing detection which plays a key role in estimation of the number of speakers with their states. To address this limitation, we propose to use intensity particle flow (IPF) in NPFSMC-PHD filter. The proposed method, IPF-SMC-PHD, considers the clutter intensity and detection probability while no data association algorithms are used for the calculation of particle flow. Experiments on the LOCATA (acoustic source Localization and Tracking) dataset with the sequences of task 4 show that our proposed IPF-SMC-PHD filter improves the tracking performance in terms of estimation accuracy as compared to its baseline counterparts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis