EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion   Recognition

Haiyang Sun; Zheng Lian; Bin Liu; Ying Li; Licai Sun; Cong Cai,; Jianhua Tao; Meng Wang; Yuan Cheng

arXiv:2203.13617·eess.AS·June 12, 2023

EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition

Haiyang Sun, Zheng Lian, Bin Liu, Ying Li, Licai Sun, Cong Cai,, Jianhua Tao, Meng Wang, Yuan Cheng

PDF

Open Access

TL;DR

EmotionNAS introduces a two-stream neural architecture search framework for speech emotion recognition, automatically optimizing model structures for different feature types, leading to state-of-the-art performance.

Contribution

The paper presents a novel two-stream NAS framework that effectively combines handcrafted and deep features for SER, reducing manual design effort and improving accuracy.

Findings

01

Outperforms existing models on SER benchmarks

02

Sets new state-of-the-art results

03

Effectively integrates complementary features

Abstract

Speech emotion recognition (SER) is an important research topic in human-computer interaction. Existing works mainly rely on human expertise to design models. Despite their success, different datasets often require distinct structures and hyperparameters. Searching for an optimal model for each dataset is time-consuming and labor-intensive. To address this problem, we propose a two-stream neural architecture search (NAS) based framework, called \enquote{EmotionNAS}. Specifically, we take two-stream features (i.e., handcrafted and deep features) as the inputs, followed by NAS to search for the optimal structure for each stream. Furthermore, we incorporate complementary information in different streams through an efficient information supplement module. Experimental results demonstrate that our method outperforms existing manually-designed and NAS-based models, setting the new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Music and Audio Processing