Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery
Yifan Wu, Yuntao Yang, Zirui Liu, Zhao Li, Khushbu Pahwa, Rongbin Li,, Wenjin Zheng, Xia Hu, Zhaozhuo Xu

TL;DR
This paper introduces a weighted diversified sampling method combined with Transformer models to efficiently identify significant gene-gene interactions in single-cell data, reducing data requirements by 99% while maintaining high performance.
Contribution
The study presents a novel weighted diversified sampling algorithm that enhances data efficiency in gene interaction discovery using Transformer models.
Findings
Sampling 1% of data yields comparable results to full dataset
The sampling algorithm computes diversity scores in two passes
Efficient subset generation accelerates gene interaction analysis
Abstract
Gene-gene interactions play a crucial role in the manifestation of complex human diseases. Uncovering significant gene-gene interactions is a challenging task. Here, we present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth noteworthy gene-gene interactions. Despite the efficacy of Transformer models, their parameter intensity presents a bottleneck in data ingestion, hindering data efficiency. To mitigate this, we introduce a novel weighted diversified sampling algorithm. This algorithm computes the diversity score of each data sample in just two passes of the dataset, facilitating efficient subset generation for interaction discovery. Our extensive experimentation demonstrates that by sampling a mere 1\% of the single-cell dataset, we achieve performance comparable to that of utilizing the entire dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Gene Regulatory Network Analysis
MethodsDense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Attention Is All You Need · Adam · Linear Layer · Softmax · Multi-Head Attention · Dropout
