Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech

Yakov Kolani; Maxim Melichov; Cobi Calev; Morris Alper

arXiv:2506.12311·cs.CL·October 13, 2025

Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech

Yakov Kolani, Maxim Melichov, Cobi Calev, Morris Alper

PDF

Open Access 1 Datasets

TL;DR

Phonikud is a new lightweight Hebrew G2P system that produces fully-specified IPA transcriptions, improving real-time TTS accuracy and speed, supported by a new benchmark dataset.

Contribution

We introduce Phonikud, a novel Hebrew G2P system with minimal latency, and the ILSpeech dataset for benchmarking Hebrew phonetic conversion and TTS evaluation.

Findings

01

Phonikud outperforms prior G2P methods in accuracy.

02

Enables effective real-time Hebrew TTS with better speed-accuracy balance.

03

Provides a new benchmark dataset for Hebrew speech and phonetic research.

Abstract

Real-time text-to-speech (TTS) for Modern Hebrew is challenging due to the language's orthographic complexity. Existing solutions ignore crucial phonetic features such as stress that remain underspecified even when vowel marks are added. To address these limitations, we introduce Phonikud, a lightweight, open-source Hebrew grapheme-to-phoneme (G2P) system that outputs fully-specified IPA transcriptions. Our approach adapts an existing diacritization model with lightweight adaptors, incurring negligible additional latency. We also contribute the ILSpeech dataset of transcribed Hebrew speech with IPA annotations, serving as a benchmark for Hebrew G2P, as training data for TTS systems, and enabling audio-to-IPA for evaluating TTS performance while capturing important phonetic details. Our results demonstrate that Phonikud G2P conversion more accurately predicts phonemes from Hebrew text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

thewh1teagle/phonikud-phonemes-data
dataset· 86 dl
86 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems