Efficient and Scalable Fine-Tune of Language Models for Genome   Understanding

Huixin Zhan; Ying Nian Wu; Zijun Zhang

arXiv:2402.08075·q-bio.GN·February 14, 2024·1 cites

Efficient and Scalable Fine-Tune of Language Models for Genome Understanding

Huixin Zhan, Ying Nian Wu, Zijun Zhang

PDF

Open Access 1 Repo

TL;DR

Lingo introduces an efficient, scalable fine-tuning method that adapts natural language foundation models for genome understanding, outperforming existing methods on multiple tasks with minimal additional parameters.

Contribution

The paper proposes Lingo, a novel prefix fine-tuning approach with adaptive rank sampling, enabling effective genome understanding using language models with minimal task-specific parameters.

Findings

01

Outperforms existing fine-tuning methods on 14 genome tasks

02

Uses fewer than 2% of trainable parameters for adaptation

03

Achieves comparable or better performance than DNA foundation models

Abstract

Although DNA foundation models have advanced the understanding of genomes, they still face significant challenges in the limited scale and diversity of genomic data. This limitation starkly contrasts with the success of natural language foundation models, which thrive on substantially larger scales. Furthermore, genome understanding involves numerous downstream genome annotation tasks with inherent data heterogeneity, thereby necessitating more efficient and robust fine-tuning methods tailored for genomics. Here, we present \textsc{Lingo}: \textsc{L}anguage prefix f\textsc{In}e-tuning for \textsc{G}en\textsc{O}mes. Unlike DNA foundation models, \textsc{Lingo} strategically leverages natural language foundation models' contextual cues, recalibrating their linguistic knowledge to genomic sequences. \textsc{Lingo} further accommodates numerous, heterogeneous downstream fine-tune tasks by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhanglab-aim/lingo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics