# Two-stage Training for Chinese Dialect Recognition

**Authors:** Zongze Ren, Guofu Yang, Shugong Xu

arXiv: 1908.02284 · 2019-08-13

## TL;DR

This paper introduces a two-stage neural network system for Chinese dialect recognition that achieves high accuracy and efficiency, winning a major challenge competition.

## Contribution

The novel two-stage approach combines a shallow ResNet14 with a simple RNN, outperforming three-stage systems in accuracy and training efficiency.

## Key findings

- Achieved first place in iFlyTek Chinese Dialect Recognition Challenge.
- High accuracy on both short and long utterances.
- Reduced training time compared to three-stage systems.

## Abstract

In this paper, we present a two-stage language identification (LID) system based on a shallow ResNet14 followed by a simple 2-layer recurrent neural network (RNN) architecture, which was used for Xunfei (iFlyTek) Chinese Dialect Recognition Challenge and won the first place among 110 teams. The system trains an acoustic model (AM) firstly with connectionist temporal classification (CTC) to recognize the given phonetic sequence annotation and then train another RNN to classify dialect category by utilizing the intermediate features as inputs from the AM. Compared with a three-stage system we further explore, our results show that the two-stage system can achieve high accuracy for Chinese dialects recognition under both short utterance and long utterance conditions with less training time.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.02284/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1908.02284/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1908.02284/full.md

---
Source: https://tomesphere.com/paper/1908.02284