TL;DR
This paper empirically investigates how speech recognition errors affect neural open-domain chatbots, revealing their sensitivity and proposing training with synthetic ASR hypotheses as a partial mitigation, highlighting the need for speech-robustness.
Contribution
It is the first study to evaluate the impact of ASR hypotheses on a state-of-the-art neural dialog system and explores training strategies to improve speech robustness.
Findings
TransferTransfo is sensitive to ASR errors in dialog history.
Synthetic ASR hypotheses during training offer marginal robustness improvements.
Highlights the importance of speech-robustness as an evaluation criterion.
Abstract
Large end-to-end neural open-domain chatbots are becoming increasingly popular. However, research on building such chatbots has typically assumed that the user input is written in nature and it is not clear whether these chatbots would seamlessly integrate with automatic speech recognition (ASR) models to serve the speech modality. We aim to bring attention to this important question by empirically studying the effects of various types of synthetic and actual ASR hypotheses in the dialog history on TransferTransfo, a state-of-the-art Generative Pre-trained Transformer (GPT) based neural open-domain dialog system from the NeurIPS ConvAI2 challenge. We observe that TransferTransfo trained on written data is very sensitive to such hypotheses introduced to the dialog history during inference time. As a baseline mitigation strategy, we introduce synthetic ASR hypotheses to the dialog history…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention · Attention Is All You Need · Byte Pair Encoding · Dropout · Label Smoothing · Residual Connection
