Comparison of Uniform and Random Sampling for Speech and Music Signals

Nematollah Zarmehi; Sina Shahsavari; Farokh Marvasti

arXiv:1705.01457·eess.AS·September 18, 2017

Comparison of Uniform and Random Sampling for Speech and Music Signals

Nematollah Zarmehi, Sina Shahsavari, Farokh Marvasti

PDF

TL;DR

This paper compares uniform and random sampling methods for speech and music signals, demonstrating that uniform sampling with cubic spline interpolation yields superior performance over other techniques.

Contribution

It provides a comparative analysis of sampling schemes and introduces an effective recovery method using cubic spline interpolation for audio signals.

Findings

01

Uniform sampling with cubic spline interpolation outperforms other methods.

02

Simulation results validate the effectiveness of the proposed approach.

03

Adaptive thresholding enhances recovery quality.

Abstract

In this paper, we will provide a comparison between uniform and random sampling for speech and music signals. There are various sampling and recovery methods for audio signals. Here, we only investigate uniform and random schemes for sampling and basic low-pass filtering and iterative method with adaptive thresholding for recovery. The simulation results indicate that uniform sampling with cubic spline interpolation outperforms other sampling and recovery methods.

Tables2

Table 1. TABLE I: Sampling and recovery schemes.

Short Name	Sampling	Description
U-AF-FFT-Sp	AF & Uniform	FFT as AF & Spline interpolation
U-AF-FFT	AF & Uniform	FFT as AF & LP-filtering
U-AF-FIR-Sp	AF & Uniform	FIR as AF & Spline interpolation
U-AF-FIR	AF & Uniform	FIR as AF & LP-filtering
R-IMATI	Random	IMATI
R-Sp	Random	Spline interpolation

Table 2. TABLE II: Objective performance metric criteri

Name	Quantity	More description
SNR	$d B$	$S N R (x, \hat{x}) = 20 \log (\frac{‖ x ‖}{‖ x - \hat{x} ‖})$
PESQ	Dimensionless	A score between 1.0 (worst) up to 4.5 (best)
CPU Time	$s e c o n d$	Using tic and toc commands in MATLAB

Equations2

S(x)=\left\{{\begin{array}[]{*{20}{c}}\begin{array}[]{l}{C_{1}}(x)\\ \\ {C_{i}}(x)\\ \\ {C_{n}}(x)\end{array}&\begin{array}[]{l}{x_{0}}\leq x\leq{x_{1}}\\ \\ {x_{i-1}}\leq x\leq{x_{i}}\\ \\ {x_{n-1}}\leq x\leq{x_{n}},\end{array}\end{array}}\right.

S(x)=\left\{{\begin{array}[]{*{20}{c}}\begin{array}[]{l}{C_{1}}(x)\\ \\ {C_{i}}(x)\\ \\ {C_{n}}(x)\end{array}&\begin{array}[]{l}{x_{0}}\leq x\leq{x_{1}}\\ \\ {x_{i-1}}\leq x\leq{x_{i}}\\ \\ {x_{n-1}}\leq x\leq{x_{n}},\end{array}\end{array}}\right.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Comparison of Uniform and Random Sampling

for Speech and Music Signals

Nematollah Zarmehi, Sina Shahsavari, and Farokh Marvasti

Advanced Communication Research Institute

Department of Electrical Engineering

Sharif University of Technology, Tehran, Iran

Email: http://zarmehi.ir/contact.html

Abstract

In this paper, we will provide a comparison between uniform and random sampling for speech and music signals. There are various sampling and recovery methods for audio signals. Here, we only investigate uniform and random schemes for sampling and basic low-pass filtering and iterative method with adaptive thresholding for recovery. The simulation results indicate that uniform sampling with cubic spline interpolation outperforms other sampling and recovery methods.

I Introduction

Various sampling and recovery methods proposed in the field of signal processing. The Nyquist-Shannon theorem proposes condition to recover a band-limited signal from its samples [1, 2, 3]. The uniform sampling using an anti-aliasing filter and low-pass (LP) filtering were used for some decades. After that, other sampling methods such as non-uniform sampling [4, 5, 6], periodic non-uniform sampling [7, 8], and random sampling [9, 10] were proposed.

In this paper, we provide a comparison between uniform and random sampling for speech and music signals. We use basic LP filtering and spline interpolation for uniform sampling and Iterative Method with Adaptive Thresholding (IMAT) for random sampling.

II Sampling and Recovery Methods

This Section introduces the sampling and recovery methods that we are going to compare. We compare uniform and random sampling schemes for speech and music signals along with different recovery methods such as basic LP filtering, spline interpolation, and IMAT [11, 12]. IMATI is a version of IMAT algorithm that uses interpolation operator in each iteration [13]. Table I shows the sampling and recovery schemes used in this paper. The abbreviations AF stands for Anti-aliasing Filter.

We have used some objective performance metric criteria for comparison of above methods. They are listed in Table II. The recovered “.WAV” files are also saved on memory disk for subjective evaluations.

III Simulation Results

In this section, we present simulation results. We have used 44.1–48kHz speech and music “.WAV” signals. Frame size is 1024 and the simulations are done in MATLAB R2015a on Intel(R) Core(TM) i7-5960X @ 3GHz with 23GB-RAM.

III-A Speech Signal

All sampling and recovery methods are simulated on our speech dataset and the results are presented in Fig. 1 in terms of SNR.

Fig. 1 shows the SNR of all methods vs. sampling rate. In uniform sampling scheme, we used periodic uniform sampling for sampling rates greater than 0.5. According to Fig. 1, uniform sampling with spline interpolation outperforms the other methods. Another observation is that spline interpolation works well with uniform samples but its performance degraded in case of random sampling.

The Perceptual Evaluation of Speech Quality (PESQ) metric is employed to assess the quality of recovered speech signals. PESQ is approved as ITU-T Rec. P.862 [14]. The voice quality is rated by a value ranging from 1 (bad) to 5 (excellent). The results are shown in Fig 2.

III-B Music Signal

We have also compared uniform and random sampling with music signals. The SNR of all methods is compared in Fig. 4. In uniform sampling scheme, the high frequency components will be filtered; Therefore, we expect spline does not work for music signal as well as for speech signal.

Although the PESQ is not coincident with music, we measured PESQ for the recovered music signals. Fig. 4 presents the PESQ of all methods. It can be seen that uniform sampling with FIR filter as anti-aliasing filter has the best value of PESQ between these sampling and recovery methods.

We also test different FIR and IIR filters as anti-aliasing filter before uniform sampling. The original signal is recovered by cubic spline interpolation. The results are shown in Figs. 5-8. It can be seen that using FIR filter as anti-aliasing filter leads to better performance in terms of SNR and PESQ.

III-C Complexity Comparison

We provide a complexity comparison between the methods of Table I by measuring the run time of simulations. Fig. 9 shows the run time for different sampling rate. It can be seen that the simulation time does not change as sampling rate changes. This is true almost for all methods.

The normalized CPU time is also charted in Fig. 10. Random sampling with IMATI for reconstruction takes more time than the uniform sampling with basic LP filtering or spline interpolation.

IV Cubic Spline Interpolation

We observed that the spline interpolation does not work well in random sampling case. Among all spline interpolations, the cubic spline gives smoother interpolating polynomial that well matches to the samples. Thus, in cases that the signal is band-limited, the cubic spline leads to smaller error. As the name suggests, in the cubic spline, the interpolated value at each missing sample is a cubic interpolation of the values at neighboring samples. Given a set of $n+1$ data points $(x_{i},y_{i})$ where $x_{0}<x_{1}<\ldots<x_{n}$ , the spline interpolator $S(x)$ is a polynomial of degree 3 on each subinterval $[x_{i-1},x_{i}]$ where $i=1,\ldots,n$ , i.e.,

[TABLE]

where $C_{i}(x)=a_{i}+b_{i}x+c_{i}x^{2}+d_{i}x^{3}~{}(d_{i}\neq 0)$ is a cubic function. An example of cubic spline interpolation is shown in Fig. 11.

We have generated an artificial 64-sparse signal with length 1024 and sampled its inverse transformed version uniformly. The original, sampled, and reconstructed signal is shown in Fig. 12. It can be seen that the original signal is not smooth at all. In other words, it is not a nearly band-limited or LP signal. Consider two samples highlighted in Fig. 12. We expect the original signal have a value near zero that is between the value of these two samples. But it is about -30.

V Conclusion

In this paper, we compared uniform and random sampling schemes for speech and music signals. We also used different reconstruction techniques for both sampling schemes. In case the speech signal is divided into frames, both objective performance metric criteria and subjective evaluations proposed that the uniform sampling with spline interpolation outperforms other methods. This is true due to the fact that speech and music signals are LP signals. However, this is not true if the signal has high frequency components or it is sparse. In case, the signal is not purely low-pass or sparse, one can use sub-band coding in which the low and high frequency components are sampled uniformly and randomly, respectively.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. E. Shannon, “Communication in the presence of noise,” in Proc. IRE , vol. 37, January 1949, pp. 10–21.
2[2] F. Marvasti, Nonuniform Sampling: Theory and Practice . Springer, formerly Kluwer Academic/Plenum Publishers, 2001.
3[3] H. J. Landau, “Necessary density conditions for sampling and interpolation of certain entire functions,” Acta Mathematica , vol. 117, no. 1, pp. 37–52, 1967.
4[4] A. J. Jerri, “The shannon sampling theorem: Its various extensions and applications: A tutorial review,” Proceedings of the IEEE , vol. 65, no. 11, pp. 1565–1596, November 1977.
5[5] M. B. Mashhadi, N. Salarieh, E. S. Farahani, and F. A. Marvasti, “Level crossing speech sampling and its sparsity promoting reconstruction using an iterative method with adaptive thresholding,” IET Signal Processing , 2017.
6[6] F. Marvasti, “Spectrum of nonuniform samples,” Electronics Letters , vol. 20, no. 21, pp. 896–897, October 1984.
7[7] J. L. Yen, “On nonuniform sampling of bandwidth-limited signals,” IRE Transaction on Circuit Theory , vol. CT-3, pp. 251–257, December 1956.
8[8] A. Papoulis, Signal Analysis . New York: Mc Graw-Hill, 1997.