An Environmental Feature Representation in I-vector Space for Room Verification and Metadata Estimation
Desmond Caulley

TL;DR
This paper explores the use of environmental feature representations derived from i-vectors for room verification and acoustic metadata estimation, demonstrating their effectiveness and proposing methods to enhance verification accuracy.
Contribution
It introduces the application of e-vectors for room verification and proposes new methods for estimating SNR and reverberation from these features.
Findings
E-vectors can be effectively used for room verification with low error rates.
Proposed methods accurately estimate SNR and reverberation from e-vectors.
Augmenting e-vectors with metadata improves room verification performance.
Abstract
This paper investigates the application of environmental feature representations for room verification tasks and acoustic meta-data estimation. Audio recordings contain both speaker and non-speaker information. We refer to the non-speaker-related information, including channel and other environmental factors, as e-vectors. I-vectors, commonly used in speaker identification, are extracted in the total variability space and capture both speaker and channel-environment information without discrimination. Accordingly, e-vectors can be extracted from i-vectors using methods such as linear discriminant analysis. In this paper, we first demonstrate that e-vectors can be successfully applied to room verification tasks with a low equal error rate. Second, we propose two methods for estimating metadata information -- signal-to-noise (SNR) and reverberation (T60) -- from these e-vectors. When…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
