Dirichlet Mixture Model based VQ Performance Prediction for Line Spectral Frequency
Zhanyu Ma

TL;DR
This paper models the distribution of line spectral frequency parameters using a Dirichlet mixture model to predict vector quantization performance and estimate the bit rate needed for transparent speech coding.
Contribution
It introduces a DMM-based approach to derive performance bounds and links spectral distortion to MSE for improved LSF vector quantization analysis.
Findings
Derived performance bounds for LSF VQ using DMM.
Established a polynomial mapping between LSD and MSE.
Estimated minimum bit rate for transparent coding.
Abstract
In this paper, we continue our previous work on the Dirichlet mixture model (DMM)-based VQ to derive the performance bound of the LSF VQ. The LSF parameters are transformed into the LSF domain and the underlying distribution of the LSF parameters are modelled by a DMM with finite number of mixture components. The quantization distortion, in terms of the mean squared error (MSE), is calculated with the high rate theory. The mapping relation between the perceptually motivated log spectral distortion (LSD) and the MSE is empirically approximated by a polynomial. With this mapping function, the minimum required bit rate for transparent coding of the LSF is estimated.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Video Quality Assessment · Video Coding and Compression Technologies
