Super Monotonic Alignment Search
Junhyeok Lee, Hyeongju Kim

TL;DR
This paper introduces Super Monotonic Alignment Search (Super-MAS), a GPU-accelerated version of MAS that significantly speeds up text-to-speech alignment by parallelizing the algorithm and reducing execution time.
Contribution
The paper presents a novel GPU implementation of MAS using Triton and PyTorch JIT, achieving up to 72x speedup over CPU-based methods.
Findings
Super-MAS is up to 72 times faster on GPU.
Parallelization reduces inter-device copy overhead.
Code is publicly available for reproducibility.
Abstract
Monotonic alignment search (MAS), introduced by Glow-TTS, is one of the most popular algorithm in text-to-speech to estimate unknown alignments between text and speech. Since this algorithm needs to search for the most probable alignment with dynamic programming by caching all possible paths, the time complexity of the algorithm is , where is the length of text and is the length of speech representation. The authors of Glow-TTS run this algorithm on CPU, and while they mentioned it is difficult to parallelize, we found that MAS can be parallelized in text length dimension and CPU execution consumes an inordinate amount of time for inter-device copy. Therefore, we implemented a Triton kernel and PyTorch JIT script to accelerate MAS on GPU without inter-device copy. As a result, Super-MAS Triton kernel is up to 72 times faster in the extreme-length case. The code is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Scheduling and Timetabling Solutions · Constraint Satisfaction and Optimization
MethodsAffine Coupling · Normalizing Flows · Invertible 1x1 Convolution · Activation Normalization · GLOW · Glow-TTS · Mixing Adam and SGD
