Beta-CoRM: A Bayesian Approach for $n$-gram Profiles Analysis
Jos\'e A. Perusqu\'ia, Jim E. Griffin, Cristiano Villa

TL;DR
Beta-CoRM introduces a Bayesian generative model for n-gram profile analysis that enables probabilistic data representation and effective feature selection, improving classification accuracy in sequence analysis.
Contribution
The paper presents a novel Bayesian model for n-gram profiles that allows for probabilistic analysis and integrated feature selection, addressing limitations of traditional machine learning methods.
Findings
Feature selection improves classification accuracy.
The model effectively analyzes binary n-gram profile data.
Fast inference is achieved through a slice sampling algorithm.
Abstract
-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for -gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Bayesian Methods and Mixture Models
