FL-Sailer: Efficient and Privacy-Preserving Federated Learning for Scalable Single-Cell Epigenetic Data Analysis via Adaptive Sampling
Guangyi Zhang, Yi Dai, Yiyun He, Junhao Liu

TL;DR
FL-Sailer is a novel federated learning framework tailored for single-cell epigenetic data, addressing high dimensionality, sparsity, and heterogeneity while preserving privacy and improving analysis efficiency.
Contribution
It introduces adaptive leverage score sampling and an invariant VAE architecture, enabling scalable, privacy-preserving multi-institutional epigenomic analysis with theoretical guarantees.
Findings
Reduces feature dimensionality by 80% using adaptive sampling.
Outperforms centralized methods on synthetic and real datasets.
Enables multi-institutional collaboration in epigenomic research.
Abstract
Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility, yet privacy regulations and data size constraints hinder multi-institutional sharing. Federated learning (FL) offers a privacy-preserving alternative, but faces three fundamental barriers in scATAC-seq analysis: ultra-high dimensionality, extreme sparsity, and severe cross-institutional heterogeneity. We propose FL-Sailer, the first FL framework designed for scATAC-seq data. FL-Sailer integrates two key innovations: (i) adaptive leverage score sampling, which selects biologically interpretable features while reducing dimensionality by 80%, and (ii) an invariant VAE architecture, which disentangles biological signals from technical confounders via mutual information minimization. We provide a convergence guarantee, showing that FL-Sailer converges to an approximate solution of the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
