Faster Algorithm of String Comparison

Qi Xiao Yang; Sung Sam Yuan; Lu Chun; Li Zhao; Sun Peng

arXiv:cs/0112022·cs.DS·May 23, 2007·6 cites

Faster Algorithm of String Comparison

Qi Xiao Yang, Sung Sam Yuan, Lu Chun, Li Zhao, Sun Peng

PDF

Open Access

TL;DR

This paper introduces substring-based algorithms for string similarity that outperform existing token-based methods in accuracy and efficiency, achieving lower time complexity and better results in practical applications.

Contribution

The paper presents novel substring-based algorithms that improve accuracy and reduce time complexity for Field Similarity compared to prior token-based approaches.

Findings

01

Achieves time complexity of O(knm) with k<0.75 in worst case

02

Demonstrates higher accuracy through theoretical analysis and experiments

03

Significantly improves computation speed for string similarity tasks

Abstract

In many applications, it is necessary to determine the string similarity. Edit distance[WF74] approach is a classic method to determine Field Similarity. A well known dynamic programming algorithm [GUS97] is used to calculate edit distance with the time complexity O(nm). (for worst case, average case and even best case) Instead of continuing with improving the edit distance approach, [LL+99] adopted a brand new approach-token-based approach. Its new concept of token-base-retain the original semantic information, good time complex-O(nm) (for worst, average and best case) and good experimental performance make it a milestone paper in this area. Further study indicates that there is still room for improvement of its Field Similarity algorithm. Our paper is to introduce a package of substring-based new algorithms to determine Field Similarity. Combined together, our new algorithms not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Web Data Mining and Analysis