Topic Identification For Spontaneous Speech: Enriching Audio Features   With Embedded Linguistic Information

Dejan Porjazovski; Tam\'as Gr\'osz; Mikko Kurimo

arXiv:2307.11450·eess.AS·July 24, 2023

Topic Identification For Spontaneous Speech: Enriching Audio Features With Embedded Linguistic Information

Dejan Porjazovski, Tam\'as Gr\'osz, Mikko Kurimo

PDF

Open Access 1 Repo

TL;DR

This paper explores methods for topic identification in spontaneous speech, showing that audio-only and hybrid models can outperform traditional text-based approaches, especially in low-resource or noisy conditions.

Contribution

It introduces and evaluates hybrid audio-text models for topic identification, demonstrating their effectiveness over standard text-only methods in spontaneous speech scenarios.

Findings

01

Audio-only models are effective without ASR in low-resource settings.

02

Hybrid models combining audio and text outperform single-modality approaches.

03

Spontaneous speech with hesitations challenges traditional ASR-based methods.

Abstract

Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model. These approaches work well in high-resource scenarios, where there are sufficient data to train both components of the pipeline. However, in low-resource situations, the ASR system, even if available, produces low-quality transcripts, leading to a bad text-based classifier. Moreover, spontaneous speech containing hesitations can further degrade the performance of the ASR model. In this paper, we investigate alternatives to the standard text-only solutions by comparing audio-only and hybrid techniques of jointly utilising text and audio features. The models evaluated on spontaneous Finnish speech demonstrate that purely audio-based solutions are a viable option when ASR components are not available, while the hybrid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aalto-speech/Topic-identification-for-spontaneous-Finnish-speech
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems