Human Vocal Sentiment Analysis

Andrew Huang; Puwei Bao

arXiv:1905.08632·eess.AS·May 22, 2019·29 cites

Human Vocal Sentiment Analysis

Andrew Huang, Puwei Bao

PDF

Open Access

TL;DR

This paper explores combining traditional vocal features, deep learning, and contextual textual data to improve emotion classification in speech, assessing new models and real-time feasibility.

Contribution

It introduces novel combinations of vocal and textual analysis techniques, including testing new models and evaluating real-time application potential.

Findings

01

Enhanced emotion classification accuracy with combined methods

02

Effective data augmentation and hyperparameter tuning

03

Feasibility of real-time emotion detection

Abstract

In this paper, we use several techniques with conventional vocal feature extraction (MFCC, STFT), along with deep-learning approaches such as CNN, and also context-level analysis, by providing the textual data, and combining different approaches for improved emotion-level classification. We explore models that have not been tested to gauge the difference in performance and accuracy. We apply hyperparameter sweeps and data augmentation to improve performance. Finally, we see if a real-time approach is feasible, and can be readily integrated into existing systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Emotion and Mood Recognition · Speech Recognition and Synthesis