# A longitudinal analysis of function annotations of the human proteome reveals consistently high biases

**Authors:** An Phan, Parnal Joshi, Claus Kadelka, Iddo Friedberg

PMC · DOI: 10.1093/database/baaf036 · Database: The Journal of Biological Databases and Curation · 2025-05-07

## TL;DR

This study shows that knowledge about human proteins is highly uneven, with many proteins being under-researched despite their potential importance.

## Contribution

The paper introduces a novel approach using economic and information theory tools to analyze functional annotation disparities in the human proteome.

## Key findings

- The distribution of functional knowledge across human proteins is highly skewed and has remained so over the past decade.
- There is a significant gap between the knowledge in databases and the scientific interest reflected in literature.
- The study suggests redirecting research efforts to less-studied proteins to reduce disparities.

## Abstract

The resources required to study gene function are limited, especially when considering the number of genes in the human genome and the complexity of their function. Therefore, genes are prioritized for experimental studies based on many different considerations, including, but not limited to, perceived biomedical importance, such as disease-associated genes, or the understanding of biological processes, such as cell signalling pathways. At the same time, most genes are not studied or are under-characterized, which hampers our understanding of their function and potential effects on human health and wellness. Understanding function annotation disparity is a necessary first step toward understanding how much functional knowledge is gained from the human genome, and toward guidelines for better targeting future studies of the genes in the human genome effectively. Here, we present a comprehensive longitudinal analysis of the human proteome utilizing data analysis tools from economics and information theory. Specifically, we view the human proteome as a population of proteins within a knowledge economy: we treat the quantified knowledge of the protein’s function as the analogue of wealth and examine the distribution of information in a population of proteins in the proteome in the same manner distribution of wealth is studied in societies. Our results show a highly skewed distribution of information about human proteins over the last decade, in which the inequality in the annotations given to the proteins remains high. Additionally, we examine the correlation between the knowledge about protein function as captured in databases and the interest in proteins as reflected by mentions in the scientific literature. We show a large gap between knowledge and interest and dissect the factors leading to this gap. In conclusion, our study shows that research efforts should be redirected to less studied proteins to mitigate the disparity among human proteins both in databases and literature.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12060720/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12060720/full.md

## References

84 references — full list in the complete paper: https://tomesphere.com/paper/PMC12060720/full.md

---
Source: https://tomesphere.com/paper/PMC12060720