Back to Supervision: Boosting Word Boundary Detection through Frame   Classification

Simone Carnemolla; Salvatore Calcagno; Simone Palazzo; Daniela; Giordano

arXiv:2411.10423·cs.LG·November 18, 2024

Back to Supervision: Boosting Word Boundary Detection through Frame Classification

Simone Carnemolla, Salvatore Calcagno, Simone Palazzo, Daniela, Giordano

PDF

Open Access 1 Repo

TL;DR

This paper introduces a supervised, model-agnostic framework for word boundary detection in speech, utilizing label augmentation and frame selection, achieving state-of-the-art results on Buckeye and TIMIT datasets.

Contribution

It presents a novel supervised approach with label augmentation and frame selection that outperforms existing methods using advanced encoder models.

Findings

01

HuBERT encoder achieves highest performance

02

State-of-the-art F-values on Buckeye and TIMIT datasets

03

Robust preprocessing method for audio tokenization

Abstract

Speech segmentation at both word and phoneme levels is crucial for various speech processing tasks. It significantly aids in extracting meaningful units from an utterance, thus enabling the generation of discrete elements. In this work we propose a model-agnostic framework to perform word boundary detection in a supervised manner also employing a labels augmentation technique and an output-frame selection strategy. We trained and tested on the Buckeye dataset and only tested on TIMIT one, using state-of-the-art encoder models, including pre-trained solutions (Wav2Vec 2.0 and HuBERT), as well as convolutional and convolutional recurrent networks. Our method, with the HuBERT encoder, surpasses the performance of other state-of-the-art architectures, whether trained in supervised or self-supervised settings on the same datasets. Specifically, we achieved F-values of 0.8427 on the Buckeye…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

simonecarnemolla/Word-Segmenter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis