GIPFA: Generating IPA Pronunciation from Audio

Xavier Marjou

arXiv:2006.07573·cs.CL·September 23, 2021

GIPFA: Generating IPA Pronunciation from Audio

Xavier Marjou

PDF

Open Access 1 Repo

TL;DR

This paper introduces GIPFA, an ANN-based system that automatically generates IPA pronunciations from audio, achieving 75% accuracy on French words and aiding in error detection and phoneme identification.

Contribution

The study presents a novel neural network approach for automatic IPA transcription from audio, specifically tailored for French, with insights into dataset errors and phoneme similarities.

Findings

01

75% accuracy in IPA prediction on test data

02

Model helps identify dataset errors and phoneme similarities

03

Demonstrates feasibility of automated IPA transcription from audio

Abstract

Transcribing spoken audio samples into the International Phonetic Alphabet (IPA) has long been reserved for experts. In this study, we examine the use of an Artificial Neural Network (ANN) model to automatically extract the IPA phonemic pronunciation of a word based on its audio pronunciation, hence its name Generating IPA Pronunciation From Audio (GIPFA). Based on the French Wikimedia dictionary, we trained our model which then correctly predicted 75% of the IPA pronunciations tested. Interestingly, by studying inference errors, the model made it possible to highlight possible errors in the dataset as well as to identify the closest phonemes in French.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marxav/gipfa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing