Augmenting Biological Fitness Prediction Benchmarks with Landscapes Features from GraphFLA

Mingyu Huang; Shasha Zhou; Ke Li

arXiv:2510.24826·cs.LG·October 30, 2025

Augmenting Biological Fitness Prediction Benchmarks with Landscapes Features from GraphFLA

Mingyu Huang, Shasha Zhou, Ke Li

PDF

TL;DR

This paper introduces GraphFLA, a Python framework that enhances biological fitness prediction benchmarks by adding landscape topographical features, enabling better interpretation and comparison of model performance across diverse biological datasets.

Contribution

GraphFLA provides a novel method to analyze and interpret fitness landscapes with biologically relevant features, improving benchmarking of mutational effect prediction models.

Findings

01

GraphFLA successfully characterizes landscape topography across diverse datasets.

02

Application of GraphFLA reveals factors influencing model accuracy.

03

Release of extensive empirical fitness landscapes for future research.

Abstract

Machine learning models increasingly map biological sequence-fitness landscapes to predict mutational effects. Effective evaluation of these models requires benchmarks curated from empirical data. Despite their impressive scales, existing benchmarks lack topographical information regarding the underlying fitness landscapes, which hampers interpretation and comparison of model performance beyond averaged scores. Here, we introduce GraphFLA, a Python framework that constructs and analyzes fitness landscapes from mutagensis data in diverse modalities (e.g., DNA, RNA, protein, and beyond) with up to millions of mutants. GraphFLA calculates 20 biologically relevant features that characterize 4 fundamental aspects of landscape topography. By applying GraphFLA to over 5,300 landscapes from ProteinGym, RNAGym, and CIS-BP, we demonstrate its utility in interpreting and comparing the performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.