BioArc: Discovering Optimal Neural Architectures for Biological Foundation Models
Yi Fang, Haoran Xu, Jiaxin Han, Sirui Ding, Yizhi Wang, Yue Wang, Xuan Wang

TL;DR
BioArc introduces a systematic neural architecture search framework tailored for biological data, enabling the discovery of high-performance models that outperform repurposed general AI architectures in biology.
Contribution
The paper presents BioArc, a novel NAS-based framework for automated, principled architecture discovery specifically designed for biological foundation models.
Findings
Identified novel architectures with superior performance on biological tasks.
Established empirical design principles for biological neural architectures.
Developed methods to predict optimal architectures for new biological tasks.
Abstract
Foundation models have revolutionized various fields such as natural language processing (NLP) and computer vision (CV). While efforts have been made to transfer the success of the foundation models in general AI domains to biology, existing works focus on directly adopting the existing foundation model architectures from general machine learning domains without a systematic design considering the unique physicochemical and structural properties of each biological data modality. This leads to suboptimal performance, as these repurposed architectures struggle to capture the long-range dependencies, sparse information, and complex underlying ``grammars'' inherent to biological data. To address this gap, we introduce BioArc, a novel framework designed to move beyond intuition-driven architecture design towards principled, automated architecture discovery for biological foundation models.…
Peer Reviews
Decision·Submitted to ICLR 2026
* **Problem Significance**: The paper correctly identifies a key bottleneck in computational biology: the reliance on intuition-driven or repurposed architectures. The move towards a principled, automated discovery process is a strong and important research direction. * **Comprehensive Analysis**: The study's greatest strength is its systematic evaluation. It does not just search for a single best model but rigorously analyzes the "interplay between architecture, tokenization, and training strat
My main concerns are focused on the clarity of the supernet training methodology and the presentation of some results. * **Major Question on Supernet Training Dynamics**: The paper adopts a "Single Path One-Shot" approach where a single path is sampled and updated in each step. This raises two critical questions about training stability and fairness: * * Training Imbalance: In a vast search space, some shared blocks (e.g., a final-layer block) may be part of significantly more candidate paths th
In my reading, I found the presented results to be impressive. The authors also uncovered useful and interesting architectural insights. 1. On the DNA-based GUE benchmark, BIOARC achieves **8-15 point absolute improvements** over established baselines (DNABERT-2, VQDNA, Nucleotide Transformer) across 12 diverse genomic tasks. These gains represent a clear improvement in performance on a standardized benchmark, achieved with models that are **much** smaller in parameter count and trained on sub
1. The biggest weaknesses of this paper are the presentation, writing, and clarity. It was quite hard for me to parse the method in its entirety, as well as the presentation of the various training configurations of the BioArc models (only-ft, ...). In fact, the paper has several places where the writing is extremely concise, to the point of being a blurb (eg, A.8.6, L1187). A prime example of writing sloppiness can be seen in Appendix A.8.7 (L1271), where the section is woefully incomplete and
- The idea of applying techniques in Neural Architecture Search for discovery of patterns and design principles for biological foundation models is novel and interesting - The paper is very detailed and considers different important aspects of biological foundation models such as tokenizations and architectural design aspects such as type of block used, the width of the network and the depth of the network. - The paper is very well written and clearly structured in most parts - The contributio
- The techniques of weight sharing, random path sampling followed by training of a predictor are quite well studied in the NAS literature and across applications [1,2,3], the application of the techniques for biological foundation models is however novel. - The experimental setup is not very clear in some parts: - Could the author's elaborate on the total compute budget for finetuning architectures? How faster is the convergence of a model initialised from supernet weights v/s a model train
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Machine Learning in Materials Science · Bioinformatics and Genomic Networks
