Joint model-based recognition and localization of overlapped acoustic events using a set of distributed small microphone arrays
Rupayan Chakraborty, Climent Nadeu

TL;DR
This paper introduces a joint model-based system that simultaneously recognizes, localizes, and detects overlapping acoustic events using multiple small microphone arrays, improving performance over traditional separate methods.
Contribution
It presents a novel integrated approach for acoustic event recognition and localization in overlapping scenarios using distributed microphone arrays, with demonstrated experimental advantages.
Findings
Improved accuracy in recognizing overlapping acoustic events.
Effective localization of multiple simultaneous sources.
Enhanced performance with estimated priors.
Abstract
In the analysis of acoustic scenes, often the occurring sounds have to be detected in time, recognized, and localized in space. Usually, each of these tasks is done separately. In this paper, a model-based approach to jointly carry them out for the case of multiple simultaneous sources is presented and tested. The recognized event classes and their respective room positions are obtained with a single system that maximizes the combination of a large set of scores, each one resulting from a different acoustic event model and a different beamformer output signal, which comes from one of several arbitrarily-located small microphone arrays. By using a two-step method, the experimental work for a specific scenario consisting of meeting-room acoustic events, either isolated or overlapped with speech, is reported. Tests carried out with two datasets show the advantage of the proposed approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
