# Constrained Output Embeddings for End-to-End Code-Switching Speech   Recognition with Only Monolingual Data

**Authors:** Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong, Chng, Chongjia Ni, Bin Ma

arXiv: 1904.03802 · 2021-01-14

## TL;DR

This paper introduces a novel training method for end-to-end code-switching speech recognition that uses only monolingual data, aligning output embeddings to facilitate language switching and significantly improving performance.

## Contribution

It proposes a new approach using Jensen-Shannon divergence and cosine distance constraints to align monolingual output embeddings, enabling effective code-switching recognition without bilingual training data.

## Key findings

- Achieved up to 4.5% absolute error rate reduction on Mandarin-English task.
- Demonstrated the effectiveness of embedding distribution alignment in code-switching ASR.
- Validated the approach with substantial improvements over baseline models.

## Abstract

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.03802/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1904.03802/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1904.03802/full.md

---
Source: https://tomesphere.com/paper/1904.03802