Efficient and Scalable Fine-Tune of Language Models for Genome Understanding
Huixin Zhan, Ying Nian Wu, Zijun Zhang

TL;DR
Lingo introduces an efficient, scalable fine-tuning method that adapts natural language foundation models for genome understanding, outperforming existing methods on multiple tasks with minimal additional parameters.
Contribution
The paper proposes Lingo, a novel prefix fine-tuning approach with adaptive rank sampling, enabling effective genome understanding using language models with minimal task-specific parameters.
Findings
Outperforms existing fine-tuning methods on 14 genome tasks
Uses fewer than 2% of trainable parameters for adaptation
Achieves comparable or better performance than DNA foundation models
Abstract
Although DNA foundation models have advanced the understanding of genomes, they still face significant challenges in the limited scale and diversity of genomic data. This limitation starkly contrasts with the success of natural language foundation models, which thrive on substantially larger scales. Furthermore, genome understanding involves numerous downstream genome annotation tasks with inherent data heterogeneity, thereby necessitating more efficient and robust fine-tuning methods tailored for genomics. Here, we present \textsc{Lingo}: \textsc{L}anguage prefix f\textsc{In}e-tuning for \textsc{G}en\textsc{O}mes. Unlike DNA foundation models, \textsc{Lingo} strategically leverages natural language foundation models' contextual cues, recalibrating their linguistic knowledge to genomic sequences. \textsc{Lingo} further accommodates numerous, heterogeneous downstream fine-tune tasks by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics
