Hierarchical Attention Transformer Architecture For Syntactic Spell   Correction

Abhishek Niranjan; M Ali Basha Shaik; Kushal Verma

arXiv:2005.04876·cs.LG·May 12, 2020·1 cites

Hierarchical Attention Transformer Architecture For Syntactic Spell Correction

Abhishek Niranjan, M Ali Basha Shaik, Kushal Verma

PDF

Open Access

TL;DR

This paper introduces a hierarchical attention transformer with multiple encoders for improved spell correction, achieving higher accuracy and faster training compared to existing models.

Contribution

It presents a novel multi-encoder transformer architecture that leverages character n-grams for enhanced spell correction performance.

Findings

01

Significant reduction in CER, WER, and SER error rates.

02

Model trains approximately 7.8 times faster.

03

Model size is about one-third of comparable architectures.

Abstract

The attention mechanisms are playing a boosting role in advancements in sequence-to-sequence problems. Transformer architecture achieved new state of the art results in machine translation, and it's variants are since being introduced in several other sequence-to-sequence problems. Problems which involve a shared vocabulary, can benefit from the similar semantic and syntactic structure in the source and target sentences. With the motivation of building a reliable and fast post-processing textual module to assist all the text-related use cases in mobile phones, we take on the popular spell correction problem. In this paper, we propose multi encoder-single decoder variation of conventional transformer. Outputs from the three encoders with character level 1-gram, 2-grams and 3-grams inputs are attended in hierarchical fashion in the decoder. The context vectors from the encoders clubbed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax