Topic Classification on Spoken Documents Using Deep Acoustic and   Linguistic Features

Tan Liu; Wu Guo; Bin Gu

arXiv:2106.08637·cs.CL·June 17, 2021

Topic Classification on Spoken Documents Using Deep Acoustic and Linguistic Features

Tan Liu, Wu Guo, Bin Gu

PDF

Open Access

TL;DR

This paper introduces a novel approach for topic classification of spoken documents by fusing deep acoustic and linguistic features without relying on ASR transcripts, leading to improved accuracy.

Contribution

The paper proposes a new framework that combines deep acoustic features with linguistic features derived from a phoneme-to-word module, using a multi-head attention mechanism for better classification.

Findings

01

Outperforms traditional ASR+TTC systems in accuracy.

02

Achieves a 3.13% improvement in classification accuracy.

03

Demonstrates effectiveness on Switchboard corpus subset.

Abstract

Topic classification systems on spoken documents usually consist of two modules: an automatic speech recognition (ASR) module to convert speech into text and a text topic classification (TTC) module to predict the topic class from the decoded text. In this paper, instead of using the ASR transcripts, the fusion of deep acoustic and linguistic features is used for topic classification on spoken documents. More specifically, a conventional CTC-based acoustic model (AM) using phonemes as output units is first trained, and the outputs of the layer before the linear phoneme classifier in the trained AM are used as the deep acoustic features of spoken documents. Furthermore, these deep acoustic features are fed to a phoneme-to-word (P2W) module to obtain deep linguistic features. Finally, a local multi-head attention module is proposed to fuse these two types of deep features for topic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsAttention Model · Softmax · Linear Layer