Unlocking Efficiency: Adaptive Masking for Gene Transformer Models
Soumyadeep Roy, Shamik Sural, Niloy Ganguly

TL;DR
This paper introduces a curriculum masking strategy for gene transformer models that improves training efficiency and representation quality, enabling comparable performance with fewer training steps.
Contribution
It proposes CM-GEMS, a novel curriculum masking approach based on mutual information, enhancing gene model training efficiency and downstream task performance.
Findings
CM-GEMS outperforms baseline masking methods in gene classification tasks.
Models trained with CM-GEMS reach similar accuracy in fewer steps.
Curriculum learning significantly reduces training time for gene transformers.
Abstract
Gene transformer models such as Nucleotide Transformer, DNABert, and LOGO are trained to learn optimal gene sequence representations by using the Masked Language Modeling (MLM) training objective over the complete Human Reference Genome. However, the typical tokenization methods employ a basic sliding window of tokens, such as k-mers, that fail to utilize gene-centric semantics. This could result in the (trivial) masking of easily predictable sequences, leading to inefficient MLM training. Time-variant training strategies are known to improve pretraining efficiency in both language and vision tasks. In this work, we focus on using curriculum masking where we systematically increase the difficulty of masked token prediction task by using a Pointwise Mutual Information-based difficulty criterion, as gene sequences lack well-defined semantic units similar to words or sentences of NLP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Machine Learning and Data Classification · Gene Regulatory Network Analysis
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections
