Hierarchical Attention Transformer Architecture For Syntactic Spell Correction
Abhishek Niranjan, M Ali Basha Shaik, Kushal Verma

TL;DR
This paper introduces a hierarchical attention transformer with multiple encoders for improved spell correction, achieving higher accuracy and faster training compared to existing models.
Contribution
It presents a novel multi-encoder transformer architecture that leverages character n-grams for enhanced spell correction performance.
Findings
Significant reduction in CER, WER, and SER error rates.
Model trains approximately 7.8 times faster.
Model size is about one-third of comparable architectures.
Abstract
The attention mechanisms are playing a boosting role in advancements in sequence-to-sequence problems. Transformer architecture achieved new state of the art results in machine translation, and it's variants are since being introduced in several other sequence-to-sequence problems. Problems which involve a shared vocabulary, can benefit from the similar semantic and syntactic structure in the source and target sentences. With the motivation of building a reliable and fast post-processing textual module to assist all the text-related use cases in mobile phones, we take on the popular spell correction problem. In this paper, we propose multi encoder-single decoder variation of conventional transformer. Outputs from the three encoders with character level 1-gram, 2-grams and 3-grams inputs are attended in hierarchical fashion in the decoder. The context vectors from the encoders clubbed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
