# Amplifying Emotional Signals: Data-Efficient Deep Learning for Robust Speech Emotion Recognition

**Authors:** Tai Vu

arXiv: 2509.00077 · 2025-09-03

## TL;DR

This paper develops deep learning models for speech emotion recognition, demonstrating that transfer learning and data augmentation significantly improve performance on small datasets, with a ResNet34 model achieving new benchmarks.

## Contribution

It introduces a novel combination of transfer learning and data augmentation techniques for SER, achieving state-of-the-art results on benchmark datasets.

## Key findings

- ResNet34 achieved 66.7% accuracy on RAVDESS and SAVEE datasets.
- Transfer learning and data augmentation significantly improved model performance.
- The approach enhances robustness and generalization in data-scarce SER applications.

## Abstract

Speech Emotion Recognition (SER) presents a significant yet persistent challenge in human-computer interaction. While deep learning has advanced spoken language processing, achieving high performance on limited datasets remains a critical hurdle. This paper confronts this issue by developing and evaluating a suite of machine learning models, including Support Vector Machines (SVMs), Long Short-Term Memory networks (LSTMs), and Convolutional Neural Networks (CNNs), for automated emotion classification in human speech. We demonstrate that by strategically employing transfer learning and innovative data augmentation techniques, our models can achieve impressive performance despite the constraints of a relatively small dataset. Our most effective model, a ResNet34 architecture, establishes a new performance benchmark on the combined RAVDESS and SAVEE datasets, attaining an accuracy of 66.7% and an F1 score of 0.631. These results underscore the substantial benefits of leveraging pre-trained models and data augmentation to overcome data scarcity, thereby paving the way for more robust and generalizable SER systems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00077/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00077/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/2509.00077/full.md

---
Source: https://tomesphere.com/paper/2509.00077