A New Amharic Speech Emotion Dataset and Classification Benchmark
Ephrem A. Retta, Eiad Almekhlafi, Richard Sutcliffe, Mustafa Mhamed,, Haider Ali, Jun Feng

TL;DR
This paper introduces the Amharic Speech Emotion Dataset (ASED), the first of its kind for Amharic, and evaluates a new VGGb model for emotion recognition, demonstrating high accuracy and cross-language applicability.
Contribution
The paper presents the first Amharic Speech Emotion Dataset and develops a novel VGGb model, establishing benchmarks for Amharic SER.
Findings
MFCC features outperform Mel-spectrograms for Amharic SER
VGGb achieves 90.73% accuracy and fast training times
VGGb performs well across different languages and datasets
Abstract
In this paper we present the Amharic Speech Emotion Dataset (ASED), which covers four dialects (Gojjam, Wollo, Shewa and Gonder) and five different emotions (neutral, fearful, happy, sad and angry). We believe it is the first Speech Emotion Recognition (SER) dataset for the Amharic language. 65 volunteer participants, all native speakers, recorded 2,474 sound samples, two to four seconds in length. Eight judges assigned emotions to the samples with high agreement level (Fleiss kappa = 0.8). The resulting dataset is freely available for download. Next, we developed a four-layer variant of the well-known VGG model which we call VGGb. Three experiments were then carried out using VGGb for SER, using ASED. First, we investigated whether Mel-spectrogram features or Mel-frequency Cepstral coefficient (MFCC) features work best for Amharic. This was done by training two VGGb SER models on ASED,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development
MethodsConvolution · Dense Connections · Softmax · Max Pooling · Dropout · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
