Fast estimation of the ICL criterion for change-point detection problems with applications to Next-Generation Sequencing data
Alice Cleynen, The Minh Luong, Guillem Rigaill, Gregory Nuel

TL;DR
This paper introduces a computationally efficient method to estimate the ICL criterion for change-point detection, enabling practical analysis of large datasets like NGS data with improved speed and flexibility.
Contribution
We develop a general framework to estimate the ICL with linear complexity, applicable to any model, facilitating change-point detection in large-scale data.
Findings
The proposed method reduces computation time significantly.
It performs well on simulated data.
It effectively analyzes real NGS data.
Abstract
In this paper, we consider the Integrated Completed Likelihood (ICL) as a useful criterion for estimating the number of changes in the underlying distribution of data in problems where detecting the precise location of these changes is the main goal. The exact computation of the ICL requires O(Kn2) operations (with K the number of segments and n the number of data-points) which is prohibitive in many practical situations with large sequences of data. We describe a framework to estimate the ICL with O(Kn) complexity. Our approach is general in the sense that it can accommodate any given model distribution. We checked the run-time and validity of our approach on simulated data and demonstrate its good performance when analyzing real Next-Generation Sequencing (NGS) data using a negative binomial model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Algorithms and Data Compression · Genetic Associations and Epidemiology
