Counting common substrings effectively

Stanis{\l}aw Goldstein; Piotr Beling

arXiv:1209.4771·cs.DS·September 24, 2012

Counting common substrings effectively

Stanis{\l}aw Goldstein, Piotr Beling

PDF

Open Access

TL;DR

This paper introduces an efficient dynamic algorithm for counting common substrings between two strings, facilitating quick string similarity measurements using generalized n-gram methods like Niewiadomski's measure.

Contribution

The paper presents a novel dynamic algorithm for counting shared substrings, with proven correctness and complexity analysis, applicable to string similarity calculations.

Findings

01

Algorithm effectively counts common substrings.

02

Correctness and complexity are rigorously analyzed.

03

Applicable to generalized n-gram similarity measures.

Abstract

This article presents effective (dynamic) algorithm for solving a problem of counting the number of substrings of given string which are also substrings of second string. Presented algorithm can be used for example for quick calculation of strings similarity measure using generalized $n$ -gram method (Niewiadomski measure), which are shown. Correctness and complexity analyses are included. ----- W artykule przedstawiono efektywny (dynamiczny) algorytm wyznaczaj\k{a}cy miar\k{e} podobie\'nstwa wyraz\'ow za pomoc\k{a} uog\'olnionej metody $n$ -gram\'ow (miary Niewiadomskiego). Uzasadniono tak\.ze poprawno\'s\'c dzia{\l}ania algorytmu i oszacowano jego z{\l}o\.zono\'s\'c obliczeniow\k{a}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Algorithms and Data Compression · Natural Language Processing Techniques