Efficient multivariate sequence classification
Pavel P. Kuksa

TL;DR
This paper introduces a novel multivariate sequence classification method that extends univariate kernel functions to real-valued multivariate data, achieving significant accuracy improvements in music and protein sequence classification tasks.
Contribution
It proposes the MVDFQ-SK kernel method, which combines feature quantization and multivariate discrete kernels for efficient and accurate multivariate sequence classification.
Findings
Achieved 25-40% accuracy improvements over existing methods.
Demonstrated effectiveness on music and protein sequence datasets.
Provided a scalable approach for multivariate sequence analysis.
Abstract
Kernel-based approaches for sequence classification have been successfully applied to a variety of domains, including the text categorization, image classification, speech analysis, biological sequence analysis, time series and music classification, where they show some of the most accurate results. Typical kernel functions for sequences in these domains (e.g., bag-of-words, mismatch, or subsequence kernels) are restricted to {\em discrete univariate} (i.e. one-dimensional) string data, such as sequences of words in the text analysis, codeword sequences in the image analysis, or nucleotide or amino acid sequences in the DNA and protein sequence analysis. However, original sequence data are often of real-valued multivariate nature, i.e. are not univariate and discrete as required by typical -mer based sequence kernel functions. In this work, we consider the problem of the {\em…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
