Tail-Greedy Unbalanced Haar Wavelet Segmentation for Copy Number Alteration Data

Maharani Ahsani Ummi; Stuart Barber; Henry M. Wood; and Arief Gusnanto

arXiv:2604.22364·stat.AP·April 27, 2026

Tail-Greedy Unbalanced Haar Wavelet Segmentation for Copy Number Alteration Data

Maharani Ahsani Ummi, Stuart Barber, Henry M. Wood, and Arief Gusnanto

PDF

TL;DR

This paper introduces TGUHm, a novel segmentation method for copy number alteration detection that improves accuracy for short segments in noisy sequencing data by reducing false positives and enhancing sensitivity.

Contribution

The study presents a dual-thresholding tail-greedy unbalanced Haar approach that outperforms existing methods in detecting CNAs, especially short aberrations, in noisy data.

Findings

01

TGUHm achieves higher true positive rates than CBS, HaarSeg, and FDRSeg.

02

The method reduces false positives effectively in simulated noisy conditions.

03

Application to real cancer data reveals biologically relevant CNAs.

Abstract

Detecting copy number alterations (CNAs) from next-generation sequencing data remains challenging, particularly for short segments under noisy conditions. Existing segmentation methods often suffer from high false positive rates or fail to reliably detect short aberrations, especially in low-coverage data. In this study, we propose a modified tail-greedy unbalanced Haar (TGUHm) method that introduces a dual-thresholding strategy to improve segmentation accuracy. The proposed approach effectively suppresses spurious spikes while preserving sensitivity to both short and long CNA segments. Extensive simulation studies under Gaussian and heavy-tailed noise demonstrate that TGUHm consistently achieves higher true positive rates and lower false positive rates compared to state-of-the-art methods, including CBS, HaarSeg, and FDRSeg. In particular, the proposed method improves detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.