# pandasPGS: a Python package for easy retrieval of Polygenic Score Catalog data

**Authors:** Zheyu Zhang, Jintong Zhou, Tianze Cao, Yuexia Huang, Chu Huang, Yu Xia

PMC · DOI: 10.7717/peerj.18985 · PeerJ · 2025-02-12

## TL;DR

pandasPGS is a Python tool that simplifies accessing and analyzing data from the Polygenic Score Catalog.

## Contribution

pandasPGS introduces a Python package for automated and streamlined access to the PGS Catalog's REST API.

## Key findings

- pandasPGS automatically selects URLs and merges paginated data from the PGS Catalog.
- The package converts data into hierarchical pandas.DataFrame objects for easier analysis.
- It reduces the learning curve for researchers using the PGS Catalog's API.

## Abstract

The Polygenic Score (PGS) Catalog is a public database dedicated to storing polygenic risk scores. To date, the database has included 5,022 polygenic risk scores associated with 656 different traits. Although the PGS Catalog offers an official resource representational state transfer (REST) application programming interface (API), there is no ready-made data client tailored for any specific programming language. Researchers are thus required to invest time in becoming familiar with the structure of the REST API and to implement a corresponding client in their programming language of choice to integrate PGS data into their analytical workflows.

In this work we introduce pandasPGS, a Python package that provides programmatic access to PGS Catalog data. After being called by the researcher, pandasPGS will automatically select the appropriate uniform resource locator (URL) and request the data based on the name and parameters of the called function, and merge the obtained pagination data. In addition, pandasPGS also provides further data pre-processing functions. According to the structure of the obtained data, it can convert the data into several hierarchical pandas.DataFrame objects, which is convenient for further analysis by researchers.

This tool allows researchers to easily analyze PGS Catalog data using Python. It alleviates the time cost for researchers to learn the REST APIs of PGS Catalog. The source codes can be found in https://github.com/tianzelab/pandaspgs, and the API documentations can be found in https://tianzelab.github.io/pandaspgs/.

## Full-text entities

- **Diseases:** dystocia (MESH:D004420), premature delivery (MESH:C536271), postpartum hemorrhage (MESH:D006473), type 2 diabetes mellitus (MESH:D003924), trauma (MESH:D014947), pregnancy complications (MESH:D011248), Gestational diabetes (MESH:D016640), diabetes (MESH:D003920), gestational hypertension (MESH:D046110), Carbohydrate intolerance (MESH:C562602)
- **Chemicals:** quincunx (-), IP (MESH:C041508)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** rs1436953, rs7172432, rs16955379, rs10830963
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11829626/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11829626/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC11829626/full.md

---
Source: https://tomesphere.com/paper/PMC11829626