# Learning Problem-agnostic Speech Representations from Multiple   Self-supervised Tasks

**Authors:** Santiago Pascual, Mirco Ravanelli, Joan Serr\`a, Antonio Bonafonte,, Yoshua Bengio

arXiv: 1904.03416 · 2019-04-09

## TL;DR

This paper introduces a multi-task self-supervised learning approach for speech representations, enabling the encoder to learn transferable, robust, and general features that capture various speech attributes without supervision.

## Contribution

It proposes a novel multi-task self-supervised framework that improves speech representation learning by enforcing consensus across tasks, enhancing transferability and robustness.

## Key findings

- Learned features encode speaker identity, phonemes, and emotional cues.
- The approach produces transferable and problem-agnostic speech representations.
- Design choices facilitate easy export and adaptation of the encoder.

## Abstract

Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some recent works, however, have shown that it is possible to derive useful speech representations by employing a self-supervised encoder-discriminator approach. This paper proposes an improved self-supervised method, where a single neural encoder is followed by multiple workers that jointly solve different self-supervised tasks. The needed consensus across different tasks naturally imposes meaningful constraints to the encoder, contributing to discover general representations and to minimize the risk of learning superficial ones. Experiments show that the proposed approach can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues. In addition, a number of design choices make the encoder easily exportable, facilitating its direct usage or adaptation to different problems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.03416/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1904.03416/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/1904.03416/full.md

---
Source: https://tomesphere.com/paper/1904.03416