Levenshtein Distance Technique in Dictionary Lookup Methods: An Improved   Approach

Rishin Haldar; Debajyoti Mukhopadhyay

arXiv:1101.1232·cs.IT·January 7, 2011·74 cites

Levenshtein Distance Technique in Dictionary Lookup Methods: An Improved Approach

Rishin Haldar, Debajyoti Mukhopadhyay

PDF

Open Access

TL;DR

This paper introduces an improved Levenshtein distance method for dictionary lookup, enhancing accuracy by grouping similar characters, which reduces search overhead and improves recognition of ambiguous OCR letters.

Contribution

The paper proposes a novel modification to the Levenshtein distance technique by grouping similar characters, leading to better performance in dictionary lookup tasks.

Findings

01

Marked improvement over traditional Levenshtein distance

02

Reduced search overhead in dictionary lookup

03

Enhanced recognition of ambiguous OCR characters

Abstract

Dictionary lookup methods are popular in dealing with ambiguous letters which were not recognized by Optical Character Readers. However, a robust dictionary lookup method can be complex as apriori probability calculation or a large dictionary size increases the overhead and the cost of searching. In this context, Levenshtein distance is a simple metric which can be an effective string approximation tool. After observing the effectiveness of this method, an improvement has been made to this method by grouping some similar looking alphabets and reducing the weighted difference among members of the same group. The results showed marked improvement over the traditional Levenshtein distance technique.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Lexicography and Language Studies · Algorithms and Data Compression