# Assessing the Potential of Ancient Protein Sequences in the Study of Hominid Evolution

**Authors:** Ioannis Patramanis, Laurits Skov, Enrico Cappellini, Fernando Racimo

PMC · DOI: 10.1093/gbe/evag035 · Genome Biology and Evolution · 2026-02-27

## TL;DR

This study explores how ancient protein sequences can help understand hominid evolution, showing both their potential and limitations in reconstructing evolutionary relationships.

## Contribution

The study quantifies the information content of ancient protein sequences and evaluates their effectiveness in phylogenetic reconstructions compared to DNA data.

## Key findings

- Some ancient proteins contain more evolutionary information than others due to differential sequence conservation.
- Protein data can sometimes lead to incorrect tree topologies due to information loss compared to DNA data.
- Concatenating multiple proteins improves resolution, but protein-based trees may not match genome-wide population trees, especially for closely related hominid groups.

## Abstract

Palaeoproteomic data can provide invaluable insights into hominid evolution over long timescales. Yet, the potential and limitations of ancient protein sequences to resolve evolutionary relations between species remains largely unexplored. In this study, we aim to quantify how much information about these relations can be obtained from limited ancient protein data, at the scale that is currently available or will be available in the near future. We harness sequence alignments of 12 enamel and collagen proteins that have been previously reported in fossil material that is at least 1 million years old. We utilize in silico translations of hominid DNA sequences of these proteins and highlight their differential sequence conservation, indicating some of them contain much larger amounts of information than others. We also evaluate the extent to which inferred topologies from protein data differ from inferred topologies from the more informationally dense DNA data. We show that the former may sometimes lead to inferences of the wrong tree topology due to the informational loss that comes when working with peptide data. Additionally, we determine the number of concatenated proteins necessary to confidently reconstruct the population/species tree summarizing the relations between humans, chimpanzees, and gorillas, as well as those between modern humans, Neanderthals, and Denisovans. As expected, increasing the number of proteins in a concatenation enhances resolution, but we note that trees inferred from the full set of collagen and enamel proteins do not necessarily correspond to population trees inferred from genome-wide data. We show this is especially the case in the closely related groups of our recent ancestors. We further demonstrate that while a number of proteins fall within archaic introgressed haplotypes of present day humans, ancient admixture is not the main source of the observed tree incongruence. Our study underscores the potential and limitations of utilizing palaeoproteomic data in deep time phylogenetic reconstructions, indicating that these will be aided not only by increased recovery of proteins in the future, but also by more careful modeling of evolutionary relations across the genome, beyond simply building single phylogenetic trees.

## Linked entities

- **Species:** Homo sapiens (taxon 9606)

## Full-text entities

- **Genes:** AHSG (alpha 2-HS glycoprotein) [NCBI Gene 197] {aka A2HS, AHS, APMR1, FETUA, HSGA}, MMP8 (matrix metallopeptidase 8) [NCBI Gene 4317] {aka CLG1, HNC, MMP-8, PMNL-CL}, MMP27 (matrix metallopeptidase 27) [NCBI Gene 64066] {aka MMP-27}, COL5A2 (collagen type V alpha 2 chain) [NCBI Gene 1290] {aka EDSC, EDSCL2}, ENAM (enamelin) [NCBI Gene 10117] {aka ADAI, AI1C, AIH2}, AMTN (amelotin) [NCBI Gene 401138] {aka AI3B, UNQ689}, COL22A1 (collagen type XXII alpha 1 chain) [NCBI Gene 169044], COL2A1 (collagen type II alpha 1 chain) [NCBI Gene 1280] {aka ACG2, ANFH, ANFH1, AOM, COL11A3, EDMMD}, SERPINF1 (serpin family F member 1) [NCBI Gene 5176] {aka EPC-1, OI12, OI6, PEDF, PIG35}, AMBN (ameloblastin) [NCBI Gene 258] {aka AI1F}, USP46 [NCBI Gene 101148524], ODAM (odontogenic, ameloblast associated) [NCBI Gene 54959] {aka APIN}, COL11A2 (collagen type XI alpha 2 chain) [NCBI Gene 1302] {aka DFNA13, DFNB53, FBCG2, HKE5, OSMEDA, OSMEDB}, LUM (lumican) [NCBI Gene 4060] {aka LDC, SLRR2D}, COL1A2 (collagen type I alpha 2 chain) [NCBI Gene 1278] {aka EDSARTH2, EDSCV, OI4}, POSTN (periostin) [NCBI Gene 10631] {aka OSF-2, OSF2, PDLPOSTN, PN}, COL5A1 (collagen type V alpha 1 chain) [NCBI Gene 1289] {aka EDSC, EDSCL1, FMDMF}, COL11A1 (collagen type XI alpha 1 chain) [NCBI Gene 1301] {aka CO11A1, COLL6, DFNA37, STL2}, COL12A1 (collagen type XII alpha 1 chain) [NCBI Gene 1303] {aka BA209D8.1, BTHLM2, COL12A1L, DJ234P15.1, EDSMYP, UCMD2}, H2BC12L (H2B clustered histone 12 like) [NCBI Gene 54145] {aka H2B/s, H2BFS, H2BS1}, COL17A1 (collagen type XVII alpha 1 chain) [NCBI Gene 1308] {aka BA16H23.2, BP180, BPA-2, BPAG2, ERED, JEB4}, COL5A3 (collagen type V alpha 3 chain) [NCBI Gene 50509], BGN (biglycan) [NCBI Gene 633] {aka DSPG1, MRLS, PG-S1, PGI, SEMDX, SLRR1A}, FGG [NCBI Gene 101134795], SPIN1 (spindlin 1) [NCBI Gene 10927] {aka SPIN, TDRD24}, COL1A1 (collagen type I alpha 1 chain) [NCBI Gene 1277] {aka CAFYD, EDSARTH1, EDSC, OI1, OI2, OI3}, COL3A1 (collagen type III alpha 1 chain) [NCBI Gene 1281] {aka EDS4A, EDSVASC, PMGEDSV}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, MMP7 (matrix metallopeptidase 7) [NCBI Gene 4316] {aka MMP-7, MPSL1, PUMP-1}, COL4A4 (collagen type IV alpha 4 chain) [NCBI Gene 1286] {aka ATS2, BFH, BFH1, CA44}, AMELY (amelogenin Y-linked) [NCBI Gene 266] {aka AMGL, AMGY}, AMELX (amelogenin X-linked) [NCBI Gene 265] {aka AI1E, AIH1, ALGN, AMG, AMGL, AMGX}, MMP20 (matrix metallopeptidase 20) [NCBI Gene 9313] {aka AI2A2, MMP-20}, FGB [NCBI Gene 101133258]
- **Diseases:** ILS (MESH:D015456)
- **Chemicals:** amino (-), amino acids (MESH:D000596)
- **Species:** Pan paniscus (bonobo, species) [taxon 9597], Hominidae (great apes, family) [taxon 9604], Pan troglodytes (chimpanzee, species) [taxon 9598], Pongo abelii (orang utan, species) [taxon 9601], Pseudomonas sp. AN (species) [taxon 534632], Homo sapiens (human, species) [taxon 9606], Gorilla gorilla (gorilla, species) [taxon 9593]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12962811/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12962811/full.md

## References

136 references — full list in the complete paper: https://tomesphere.com/paper/PMC12962811/full.md

---
Source: https://tomesphere.com/paper/PMC12962811