PeakSegJoint: fast supervised peak detection via joint segmentation of multiple count data samples
Toby Dylan Hocking, Guillaume Bourque

TL;DR
PeakSegJoint is a supervised, fast, and interpretable peak detection method for multiple genomic samples, capable of handling any number of sample types with overlapping peaks, outperforming existing algorithms in speed and interpretability.
Contribution
It introduces a novel constrained maximum likelihood segmentation model for multiple samples and a supervised penalty learning approach for peak number selection.
Findings
Achieves similar accuracy to state-of-the-art methods
Operates faster than existing algorithms
Provides more interpretable overlapping peak models
Abstract
Joint peak detection is a central problem when comparing samples in genomic data analysis, but current algorithms for this task are unsupervised and limited to at most 2 sample types. We propose PeakSegJoint, a new constrained maximum likelihood segmentation model for any number of sample types. To select the number of peaks in the segmentation, we propose a supervised penalty learning model. To infer the parameters of these two models, we propose to use a discrete optimization heuristic for the segmentation, and convex optimization for the penalty learning. In comparisons with state-of-the-art peak detection algorithms, PeakSegJoint achieves similar accuracy, faster speeds, and a more interpretable model with overlapping peaks that occur in exactly the same positions across all samples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genomics and Chromatin Dynamics · Genomics and Phylogenetic Studies
