TL;DR
This paper introduces a real-time, language-adaptable spell checker capable of providing context-sensitive corrections, easily extendable to multiple languages with minimal effort, outperforming existing tools across 11 languages.
Contribution
A novel, adaptable spell checking system that works in real-time and can be extended to new languages with minimal language-specific processing.
Findings
Outperforms industry-accepted spell checkers in 11 languages
Successfully generates suggestions using Wikipedia and subtitles data
Effective on synthetic datasets for 24 languages
Abstract
We present a novel language adaptable spell checking system which detects spelling errors and suggests context sensitive corrections in real-time. We show that our system can be extended to new languages with minimal language-specific processing. Available literature majorly discusses spell checkers for English but there are no publicly available systems which can be extended to work for other languages out of the box. Most of the systems do not work in real-time. We explain the process of generating a language's word dictionary and n-gram probability dictionaries using Wikipedia-articles data and manually curated video subtitles. We present the results of generating a list of suggestions for a misspelled word. We also propose three approaches to create noisy channel datasets of real-world typographic errors. We compare our system with industry-accepted spell checker tools for 11…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
