Oscillating Statistical Moments for Speech Polarity Detection
Thomas Drugman, Thierry Dutoit

TL;DR
This paper introduces a novel speech polarity detection method using oscillating statistical moments that depend on the local fundamental frequency, improving accuracy over existing techniques.
Contribution
It presents a new approach leveraging oscillating moments with phase shifts for automatic speech polarity detection, enhancing robustness and performance.
Findings
Substantial improvement over state-of-the-art methods
Effective on multiple speech corpora
Reliable detection of speech polarity
Abstract
An inversion of the speech polarity may have a dramatic detrimental effect on the performance of various techniques of speech processing. An automatic method for determining the speech polarity (which is dependent upon the recording setup) is thus required as a preliminary step for ensuring the well-behaviour of such techniques. This paper proposes a new approach of polarity detection relying on oscillating statistical moments. These moments have the property to oscillate at the local fundamental frequency and to exhibit a phase shift which depends on the speech polarity. This dependency stems from the introduction of non-linearity or higher-order statistics in the moment calculation. The resulting method is shown on 10 speech corpora to provide a substantial improvement compared to state-of-the-art techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
