SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Minchan Kim, Myeonghun Jeong, Joun Yeop Lee, Nam Soo Kim

TL;DR
SegINR introduces a segment-wise implicit neural representation for sequence alignment in neural TTS, eliminating the need for duration predictors and complex frame-level modeling, leading to improved speech quality and efficiency.
Contribution
It proposes a novel segment-wise INR method that models temporal dynamics and defines segment boundaries automatically, simplifying neural TTS pipeline.
Findings
Outperforms conventional methods in zero-shot adaptive TTS
Achieves higher speech quality with lower computational costs
Effectively models temporal dynamics within speech segments
Abstract
We present SegINR, a novel approach to neural Text-to-Speech (TTS) that addresses sequence alignment without relying on an auxiliary duration predictor and complex autoregressive (AR) or non-autoregressive (NAR) frame-level sequence modeling. SegINR simplifies the process by converting text sequences directly into frame-level features. It leverages an optimal text encoder to extract embeddings, transforming each into a segment of frame-level features using a conditional implicit neural representation (INR). This method, named segment-wise INR (SegINR), models temporal dynamics within each segment and autonomously defines segment boundaries, reducing computational costs. We integrate SegINR into a two-stage TTS framework, using it for semantic token prediction. Our experiments in zero-shot adaptive TTS scenarios demonstrate that SegINR outperforms conventional methods in speech quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
