# Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition   with Pre-trained Representations

**Authors:** Siyuan Shen, Feng Liu, Aimin Zhou

arXiv: 2302.13277 · 2023-03-02

## TL;DR

This paper introduces a parameter-free temporal shift module to enhance speech emotion recognition models by better integrating information, leading to improved performance on the IEMOCAP benchmark.

## Contribution

It proposes a novel temporal shift module for pre-trained speech models, improving emotion recognition without adding parameters or FLOPs.

## Key findings

- Outperforms state-of-the-art on IEMOCAP dataset
- Effective in both finetuning and feature extraction settings
- Enhances model performance with no additional computational cost

## Abstract

Fueled by recent advances of self-supervised models, pre-trained speech representations proved effective for the downstream speech emotion recognition (SER) task. Most prior works mainly focus on exploiting pre-trained representations and just adopt a linear head on top of the pre-trained model, neglecting the design of the downstream network. In this paper, we propose a temporal shift module to mingle channel-wise information without introducing any parameter or FLOP. With the temporal shift module, three designed baseline building blocks evolve into corresponding shift variants, i.e. ShiftCNN, ShiftLSTM, and Shiftformer. Moreover, to balance the trade-off between mingling and misalignment, we propose two technical strategies, placement of shift and proportion of shift. The family of temporal shift models all outperforms the state-of-the-art methods on the benchmark IEMOCAP dataset under both finetuning and feature extraction settings. Our code is available at https://github.com/ECNU-Cross-Innovation-Lab/ShiftSER.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.13277/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/2302.13277/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/2302.13277/full.md

---
Source: https://tomesphere.com/paper/2302.13277