Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation

Abbas Bertina; Shahab Beirami; Hossein Biniazian; Elham Esmaeilnia; Soheil Shahi; Mahdi Pirnia

arXiv:2505.06599·cs.CL·May 13, 2025

Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation

Abbas Bertina, Shahab Beirami, Hossein Biniazian, Elham Esmaeilnia, Soheil Shahi, Mahdi Pirnia

PDF

Open Access

TL;DR

This paper presents a novel intermediate language and a hybrid approach combining LLM prompting and sequence-to-sequence models to improve Persian G2P conversion, especially for homographs with multiple pronunciations, achieving state-of-the-art accuracy.

Contribution

It introduces an intermediate language tailored for Persian, integrating LLM prompts and a specialized transliteration architecture to disambiguate homographs and enhance G2P accuracy.

Findings

01

Significant reduction in Phoneme Error Rate compared to previous methods

02

Effective disambiguation of homographs with multiple pronunciations

03

Benchmark performance surpassing existing state-of-the-art approaches

Abstract

Grapheme-to-phoneme (G2P) conversion for Persian presents unique challenges due to its complex phonological features, particularly homographs and Ezafe, which exist in formal and informal language contexts. This paper introduces an intermediate language specifically designed for Persian language processing that addresses these challenges through a multi-faceted approach. Our methodology combines two key components: Large Language Model (LLM) prompting techniques and a specialized sequence-to-sequence machine transliteration architecture. We developed and implemented a systematic approach for constructing a comprehensive lexical database for homographs with multiple pronunciations disambiguation often termed polyphones, utilizing formal concept analysis for semantic differentiation. We train our model using two distinct datasets: the LLM-generated dataset for formal and informal Persian…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling