# Learning discriminative features in sequence training without requiring   framewise labelled data

**Authors:** Jun Wang, Dan Su, Jie Chen, Shulin Feng, Dongpeng Ma, Na Li, Dong Yu

arXiv: 1905.06907 · 2019-05-17

## TL;DR

This paper introduces a novel sequence training method that learns discriminative features without needing framewise labels, improving robustness and reducing word error rates in industrial ASR tasks, especially under noisy conditions.

## Contribution

A new integrated model that learns discriminative features during sequence training without presegmented data, enhancing ASR robustness to acoustic variability.

## Key findings

- Outperforms state-of-the-art models in industrial ASR tasks
- Achieves significant WER reduction under noisy conditions
- Generalizes better to unseen acoustic variability

## Abstract

In this work, we try to answer two questions: Can deeply learned features with discriminative power benefit an ASR system's robustness to acoustic variability? And how to learn them without requiring framewise labelled sequence training data? As existing methods usually require knowing where the labels occur in the input sequence, they have so far been limited to many real-world sequence learning tasks. We propose a novel method which simultaneously models both the sequence discriminative training and the feature discriminative learning within a single network architecture, so that it can learn discriminative deep features in sequence training that obviates the need for presegmented training data. Our experiment in a realistic industrial ASR task shows that, without requiring any specific fine-tuning or additional complexity, our proposed models have consistently outperformed state-of-the-art models and significantly reduced Word Error Rate (WER) under all test conditions, and especially with highest improvements under unseen noise conditions, by relative 12.94%, 8.66% and 5.80%, showing our proposed models can generalize better to acoustic variability.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.06907/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1905.06907/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1905.06907/full.md

---
Source: https://tomesphere.com/paper/1905.06907