Conformer-based End-to-end Speech Recognition With Rotary Position   Embedding

Shengqiang Li; Menglong Xu; Xiao-Lei Zhang

arXiv:2107.05907·cs.SD·July 14, 2021

Conformer-based End-to-end Speech Recognition With Rotary Position Embedding

Shengqiang Li, Menglong Xu, Xiao-Lei Zhang

PDF

Open Access

TL;DR

This paper explores the use of rotary position embedding (RoPE) in conformer-based end-to-end speech recognition models, demonstrating improved accuracy by effectively encoding positional information.

Contribution

It introduces a novel rotary position embedding method for conformers, enhancing position encoding and improving speech recognition performance.

Findings

01

RoPE encodes absolute positional info via rotation matrices.

02

RoPE improves word error rate on LibriSpeech benchmarks.

03

Enhanced conformer models outperform previous methods.

Abstract

Transformer-based end-to-end speech recognition models have received considerable attention in recent years due to their high training speed and ability to model a long-range global context. Position embedding in the transformer architecture is indispensable because it provides supervision for dependency modeling between elements at different positions in the input sequence. To make use of the time order of the input sequence, many works inject some information about the relative or absolute position of the element into the input sequence. In this work, we investigate various position embedding methods in the convolution-augmented transformer (conformer) and adopt a novel implementation named rotary position embedding (RoPE). RoPE encodes absolute positional information into the input sequence by a rotation matrix, and then naturally incorporates explicit relative position information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsRotary Position Embedding