SpeechFormer: A Hierarchical Efficient Framework Incorporating the   Characteristics of Speech

Weidong Chen; Xiaofen Xing; Xiangmin Xu; Jianxin Pang; Lan Du

arXiv:2203.03812·cs.SD·March 11, 2022

SpeechFormer: A Hierarchical Efficient Framework Incorporating the Characteristics of Speech

Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du

PDF

Open Access 1 Repo

TL;DR

SpeechFormer is a hierarchical Transformer framework designed for speech processing that leverages speech structure to improve efficiency and performance in emotion recognition and neurocognitive disorder detection.

Contribution

It introduces a novel hierarchical structure considering speech characteristics, reducing computational cost while maintaining or improving accuracy.

Findings

01

Outperforms standard Transformer in speech tasks

02

Reduces computational cost significantly

03

Achieves comparable results to state-of-the-art methods

Abstract

Transformer has obtained promising results on cognitive speech signal processing field, which is of interest in various applications ranging from emotion to neurocognitive disorder analysis. However, most works treat speech signal as a whole, leading to the neglect of the pronunciation structure that is unique to speech and reflects the cognitive process. Meanwhile, Transformer has heavy computational burden due to its full attention operation. In this paper, a hierarchical efficient framework, called SpeechFormer, which considers the structural characteristics of speech, is proposed and can be served as a general-purpose backbone for cognitive speech signal processing. The proposed SpeechFormer consists of frame, phoneme, word and utterance stages in succession, each performing a neighboring attention according to the structural pattern of speech with high computational efficiency.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

happycolor/speechformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · EEG and Brain-Computer Interfaces