On the use of Performer and Agent Attention for Spoken Language Identification
Jitendra Kumar dhiman, Jainag Ambati

TL;DR
This paper investigates the effectiveness of Performer and Agent-Attention mechanisms combined with statistical pooling for spoken language identification, demonstrating that Performer-Attention outperforms traditional self-attention with less computational cost.
Contribution
It introduces and evaluates Performer and Agent-Attention mechanisms in LID, showing their advantages over standard self-attention in accuracy and efficiency.
Findings
Performer-Attention outperforms self-attention in LID tasks.
Agent-Attention performs comparably or better than self-attention.
Performer-Attention is more computationally efficient.
Abstract
One of the methods for language Identification (LID) involves deriving speech representation from pre-trained models using self-supervised learning, followed by fine-tuning the model for the LID task. State-of-the-art approaches for LID use an attention-based statistical pooling layer to facilitate the aggregation of contextual information across time frames of the embedding vectors extracted from the pre-trained model. In this paper, we delve into exploring recently proposed attention mechanisms, namely performer and agent-attention, in conjunction with the statistical pooling layer. The LID experiments are performed on three datasets: VoxPopuli, FLEURS, and VoxLingua. We compare their performance against vanilla self-attention. Our findings suggest that performer-attention outperforms self-attention and agent-attention exhibits comparable or occasionally superior performance to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsSoftmax · Attention Is All You Need · Fast Attention Via Positive Orthogonal Random Features · Performer
