Ensemble of classifiers for speech evaluation
G. Belokrylov, A. Korenev, B. Lodonova, A. Novokhrestov

TL;DR
This paper explores an ensemble of classifiers to evaluate speech quality in medical applications, using multiple metrics and expert assessments to improve classification accuracy.
Contribution
It introduces an ensemble approach combining five classifiers for speech assessment, demonstrating slight accuracy improvements over individual models.
Findings
Ensemble method slightly outperforms individual classifiers.
Multiple distance-based metrics are effective features.
Support vector machine and ensemble methods show promising results.
Abstract
The article describes an attempt to apply an ensemble of binary classifiers to solve the problem of speech assessment in medicine. A dataset was compiled based on quantitative and expert assessments of syllable pronunciation quality. Quantitative assessments of 7 selected metrics were used as features: dynamic time warp distance, Minkowski distance, correlation coefficient, longest common subsequence (LCSS), edit distance of real se-quence (EDR), edit distance with real penalty (ERP), and merge split (MSM). Expert as-sessment of pronunciation quality was used as a class label: class 1 means high-quality speech, class 0 means distorted. A comparison of training results was carried out for five classification methods: logistic regression (LR), support vector machine (SVM), naive Bayes (NB), decision trees (DT), and K-nearest neighbors (KNN). The results of using the mixture method to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
MethodsLogistic Regression
