Domain-aware priors stabilize, not merely enable, vertical federated learning in data-scarce coral multi-omics
Sam Victor

TL;DR
This paper demonstrates that domain-aware priors significantly enhance the stability and interpretability of vertical federated learning in data-scarce multi-omics coral stress classification, outperforming standard methods.
Contribution
It introduces REEF, a domain-aware VFL framework utilizing biologically motivated priors, which improves stability and performance in small-sample, high-dimensional biological data.
Findings
REEF achieves higher AUROC than baselines in coral stress classification.
Biological priors reduce variance and improve stability of VFL models.
Domain-informed dimensionality reduction is crucial for data-scarce regimes.
Abstract
Vertical federated learning (VFL) enables multi-laboratory collaboration on distributed multi-omics datasets without sharing raw data, but exhibits severe instability under extreme data scarcity (P >> N) when applied generically. Here, we investigate how domain-aware design choices; specifically gradient saliency-guided feature selection with biologically motivated priors; affect the stability, interpretability, and failure modes of VFL architectures in small-sample coral stress classification (N = 13 samples, P = 90,579 features across transcriptomics, proteomics, metabolomics, and microbiome data). We benchmark REEF (Robust Expert Encoder Federation), a domain-aware VFL framework, against two baselines on the Montipora capitata thermal stress dataset: (i) a standard NVFlare-based VFL and (ii) LASER, a state-of-the-art label-aware VFL method. REEF achieves an AUROC of 0.776 +/- 0.039…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Bioinformatics and Genomic Networks · Gene expression and cancer classification
