Efficient Encoders for Streaming Sequence Tagging
Ayush Kaushal, Aditya Gupta, Shyam Upadhyay, Manaal Faruqui

TL;DR
This paper introduces HEAR, a hybrid encoder with adaptive restart, that significantly reduces computational costs and improves streaming sequence tagging performance compared to traditional bidirectional encoders.
Contribution
The paper proposes HEAR, a novel hybrid unidirectional-bidirectional encoder with an adaptive restart mechanism, enhancing streaming sequence tagging efficiency and accuracy.
Findings
FLOP savings of up to 71.1% in streaming settings.
Outperforms bidirectional encoders by up to +10% in streaming exact match.
Effective across four sequence tagging tasks.
Abstract
A naive application of state-of-the-art bidirectional encoders for streaming sequence tagging would require encoding each token from scratch for each new token in an incremental streaming input (like transcribed speech). The lack of re-usability of previous computation leads to a higher number of Floating Point Operations (or FLOPs) and higher number of unnecessary label flips. Increased FLOPs consequently lead to higher wall-clock time and increased label flipping leads to poorer streaming performance. In this work, we present a Hybrid Encoder with Adaptive Restart (HEAR) that addresses these issues while maintaining the performance of bidirectional encoders over the offline (or complete) inputs while improving performance on streaming (or incomplete) inputs. HEAR has a Hybrid unidirectional-bidirectional encoder architecture to perform sequence tagging, along with an Adaptive Restart…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
