A model selection approach for multiple sequence segmentation and   dimensionality reduction

Bruno M. de Castro; Florencia Leonardi

arXiv:1501.01756·stat.ME·January 9, 2015·J. Multivar. Anal.·1 cites

A model selection approach for multiple sequence segmentation and dimensionality reduction

Bruno M. de Castro, Florencia Leonardi

PDF

Open Access

TL;DR

This paper introduces a penalized likelihood method with dynamic programming and hierarchical algorithms for segmenting multiple aligned sequences into independent blocks, ensuring consistency and efficiency.

Contribution

It presents a novel combined approach for sequence segmentation and dimensionality reduction with proven theoretical consistency and practical algorithms.

Findings

01

Algorithms are computationally efficient with $O(m^2n)$ and $O(mn)$ complexities.

02

The methods are consistent and converge as sample size increases.

03

Successful application to Ebola Virus protein sequence alignment.

Abstract

In this paper we consider the problem of segmenting $n$ aligned random sequences of equal length $m$ , into a finite number of independent blocks. We propose to use a penalized maximum likelihood criterion to infer simultaneously the number of points of independence as well as the position of each one of these points. We show how to compute the estimator efficiently by means of a dynamic programming algorithm with time complexity $O (m^{2} n)$ . We also propose another algorithm, called hierarchical algorithm, that provides an approximation to the estimator when the sample size increases and runs in time $O (mn)$ . Our main theoretical result is the proof of almost sure consistency of the estimator and the convergence of the hierarchical algorithm when the sample size $n$ grows to infinity. We illustrate the convergence of these algorithms through some simulation examples and we apply the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Bayesian Methods and Mixture Models · Machine Learning and Algorithms