# Spoken Language Intent Detection using Confusion2Vec

**Authors:** Prashanth Gurunath Shivakumar, Mu Yang, Panayiotis Georgiou

arXiv: 1904.03576 · 2019-10-24

## TL;DR

This paper introduces a confusion2vec-based approach for spoken language intent detection that enhances robustness against ASR errors, achieving state-of-the-art results on the ATIS dataset under noisy conditions.

## Contribution

The paper proposes using confusion2vec embeddings to improve intent detection robustness in noisy ASR environments, a novel application of acoustic-aware word representations.

## Key findings

- Reduces classification error rate by 20.84% under noisy conditions
- Improves robustness by 37.48% compared to previous methods
- Achieves state-of-the-art results on ATIS dataset with noisy transcripts

## Abstract

Decoding speaker's intent is a crucial part of spoken language understanding (SLU). The presence of noise or errors in the text transcriptions, in real life scenarios make the task more challenging. In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems. We propose to employ confusion2vec word feature representation to compensate for the errors made by ASR and to increase the robustness of the SLU system. The confusion2vec, motivated from human speech production and perception, models acoustic relationships between words in addition to the semantic and syntactic relations of words in human language. We hypothesize that ASR often makes errors relating to acoustically similar words, and the confusion2vec with inherent model of acoustic relationships between words is able to compensate for the errors. We demonstrate through experiments on the ATIS benchmark dataset, the robustness of the proposed model to achieve state-of-the-art results under noisy ASR conditions. Our system reduces classification error rate (CER) by 20.84% and improves robustness by 37.48% (lower CER degradation) relative to the previous state-of-the-art going from clean to noisy transcripts. Improvements are also demonstrated when training the intent detection models on noisy transcripts.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.03576/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1904.03576/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1904.03576/full.md

---
Source: https://tomesphere.com/paper/1904.03576