The Development of a Comprehensive Spanish Dictionary for Phonetic and Lexical Tagging in Socio-phonetic Research (ESPADA)
Simon Gonzalez

TL;DR
This paper introduces ESPADA, a comprehensive open-source Spanish pronunciation dictionary with over 628,000 entries covering 16 countries, designed to improve phonetic and lexical analysis in socio-phonetic research.
Contribution
The creation of ESPADA, the largest and most inclusive Spanish pronunciation dictionary, integrating dialectal variations, morphological, lexical, and phonetic annotations for socio-phonetic studies.
Findings
Over 628,000 entries covering 16 countries
Enhanced dialectal and phonetic analysis capabilities
Open-source resource for socio-phonetic research
Abstract
Pronunciation dictionaries are an important component in the process of speech forced alignment. The accuracy of these dictionaries has a strong effect on the aligned speech data since they help the mapping between orthographic transcriptions and acoustic signals. In this paper, I present the creation of a comprehensive pronunciation dictionary in Spanish (ESPADA) that can be used in most of the dialect variants of Spanish data. Current dictionaries focus on specific regional variants, but with the flexible nature of our tool, it can be readily applied to capture the most common phonetic differences across major dialectal variants. We propose improvements to current pronunciation dictionaries as well as mapping other relevant annotations such as morphological and lexical information. In terms of size, it is currently the most complete dictionary with more than 628,000 entries,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpanish Linguistics and Language Studies · Linguistic Studies and Language Acquisition
MethodsFocus
