Efficient Global String Kernel with Random Features: Beyond Counting Substructures
Lingfei Wu, Ian En-Hsu Yen, Siyu Huo, Liang Zhao, Kun Xu, Liang Ma,, Shouling Ji, Charu Aggarwal

TL;DR
This paper introduces a novel global string kernel using random features that captures long-range patterns, maintains positive-definiteness, and scales linearly with string length and dataset size, improving efficiency and accuracy.
Contribution
The authors propose a new class of global string kernels based on random features, enabling efficient, positive-definite similarity measures that capture global properties and scale linearly with data size.
Findings
Achieves better or comparable accuracy to state-of-the-art methods.
Scales linearly with string length and dataset size.
Effectively captures long-range patterns in strings.
Abstract
Analysis of large-scale sequential data has been one of the most crucial tasks in areas such as bioinformatics, text, and audio mining. Existing string kernels, however, either (i) rely on local features of short substructures in the string, which hardly capture long discriminative patterns, (ii) sum over too many substructures, such as all possible subsequences, which leads to diagonal dominance of the kernel matrix, or (iii) rely on non-positive-definite similarity measures derived from the edit distance. Furthermore, while there have been works addressing the computational challenge with respect to the length of string, most of them still experience quadratic complexity in terms of the number of training samples when used in a kernel-based classifier. In this paper, we present a new class of global string kernels that aims to (i) discover global properties hidden in the strings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Advanced Data Compression Techniques
