BSpell: A CNN-Blended BERT Based Bangla Spell Checker
Chowdhury Rafeed Rahman, MD. Hasibur Rahman, Samiha Zakir, Mohammad, Rafsan, Mohammed Eunus Ali

TL;DR
BSpell is a novel Bangla spell checker that combines CNN and BERT, utilizing a hybrid pretraining scheme and specialized training to effectively correct spelling errors in Bangla text.
Contribution
It introduces a CNN-BERT hybrid model with a unique pretraining scheme and auxiliary loss for improved Bangla spelling correction.
Findings
Outperforms existing methods on Bangla and Hindi datasets
Effective in handling highly inflected Bangla vocabulary
Available as an open-source tool on GitHub
Abstract
Bangla typing is mostly performed using English keyboard and can be highly erroneous due to the presence of compound and similarly pronounced letters. Spelling correction of a misspelled word requires understanding of word typing pattern as well as the context of the word usage. A specialized BERT model named BSpell has been proposed in this paper targeted towards word for word correction in sentence level. BSpell contains an end-to-end trainable CNN sub-model named SemanticNet along with specialized auxiliary loss. This allows BSpell to specialize in highly inflected Bangla vocabulary in the presence of spelling errors. Furthermore, a hybrid pretraining scheme has been proposed for BSpell that combines word level and character level masking. Comparison on two Bangla and one Hindi spelling correction dataset shows the superiority of our proposed approach. BSpell is available as a Bangla…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Dropout · Softmax
