Human Vocal Sentiment Analysis
Andrew Huang, Puwei Bao

TL;DR
This paper explores combining traditional vocal features, deep learning, and contextual textual data to improve emotion classification in speech, assessing new models and real-time feasibility.
Contribution
It introduces novel combinations of vocal and textual analysis techniques, including testing new models and evaluating real-time application potential.
Findings
Enhanced emotion classification accuracy with combined methods
Effective data augmentation and hyperparameter tuning
Feasibility of real-time emotion detection
Abstract
In this paper, we use several techniques with conventional vocal feature extraction (MFCC, STFT), along with deep-learning approaches such as CNN, and also context-level analysis, by providing the textual data, and combining different approaches for improved emotion-level classification. We explore models that have not been tested to gauge the difference in performance and accuracy. We apply hyperparameter sweeps and data augmentation to improve performance. Finally, we see if a real-time approach is feasible, and can be readily integrated into existing systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Emotion and Mood Recognition · Speech Recognition and Synthesis
