RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer
Seongho Hong, Yong-Hoon Choi

TL;DR
RingFormer is a novel neural vocoder that combines ring attention with a convolution-augmented transformer to efficiently generate high-quality, real-time audio by capturing both local and global information in long sequences.
Contribution
The paper introduces RingFormer, a lightweight neural vocoder that integrates ring attention into a Conformer, enabling efficient, real-time audio synthesis with improved global and local feature modeling.
Findings
Achieves comparable or better quality than state-of-the-art vocoders.
Excels in real-time audio generation tasks.
Outperforms existing models on objective and subjective metrics.
Abstract
While transformers demonstrate outstanding performance across various audio tasks, their application to neural vocoders remains challenging. Neural vocoders require the generation of long audio signals at the sample level, which demands high temporal resolution. This results in significant computational costs for attention map generation and limits their ability to efficiently process both global and local information. Additionally, the sequential nature of sample generation in neural vocoders poses difficulties for real-time processing, making the direct adoption of transformers impractical. To address these challenges, we propose RingFormer, a neural vocoder that incorporates the ring attention mechanism into a lightweight transformer variant, the convolution-augmented transformer (Conformer). Ring attention effectively captures local details while integrating global information,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Attention Is All You Need · HiFi-GAN
