Fuzzy Segmentations of a String
Armen Kostanyan, Arevik Harmandayan

TL;DR
This paper introduces a heuristic algorithm for fuzzy segmentation of text, capable of identifying groups of adjacent segments matching fuzzy patterns, with proven effectiveness in specific cases and an optimal solution for overall text segmentation.
Contribution
The paper presents a novel heuristic algorithm utilizing prefix structures for fuzzy text segmentation and proves its completeness in the case of fuzzy string matching.
Findings
The heuristic algorithm finds all matching segments in fuzzy string matching.
Dynamic programming effectively determines the best overall text segmentation.
The approach advances fuzzy clustering and approximate string matching techniques.
Abstract
This article discusses a particular case of the data clustering problem, where it is necessary to find groups of adjacent text segments of the appropriate length that match a fuzzy pattern represented as a sequence of fuzzy properties. To solve this problem, a heuristic algorithm for finding a sufficiently large number of solutions is proposed. The key idea of the proposed algorithm is the use of the prefix structure to track the process of mapping text segments to fuzzy properties. An important special case of the text segmentation problem is the fuzzy string matching problem, when adjacent text segments have unit length and, accordingly, the fuzzy pattern is a sequence of fuzzy properties of text characters. It is proven that the heuristic segmentation algorithm in this case finds all text segments that match the fuzzy pattern. Finally, we consider the problem of a best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Processing Techniques
