# Misspelling Oblivious Word Embeddings

**Authors:** Bora Edizel, Aleksandra Piktus, Piotr Bojanowski, Rui Ferreira,, Edouard Grave, Fabrizio Silvestri

arXiv: 1905.09755 · 2019-05-24

## TL;DR

This paper introduces a new method for learning word embeddings that are robust to misspellings by combining subword information with supervised learning of misspelling patterns, improving NLP task performance.

## Contribution

It presents a novel approach that integrates subword-based embeddings with supervised misspelling pattern learning, enhancing resilience to malformed texts.

## Key findings

- Embeddings of misspelled words are close to correct variants.
- Improved performance on NLP tasks with misspelled data.
- Method outperforms standard embeddings on public benchmarks.

## Abstract

In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.09755/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1905.09755/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1905.09755/full.md

---
Source: https://tomesphere.com/paper/1905.09755