PANGEN: an online platform for the comparison and creation of diagnostic gene panels

Ofer Isakov; Dina Marek-Yagel; Rotem Greenberg; Michal Naftali; Shay Ben-Shachar

PMC · DOI:10.1093/database/baae065·July 23, 2024

PANGEN: an online platform for the comparison and creation of diagnostic gene panels

Ofer Isakov, Dina Marek-Yagel, Rotem Greenberg, Michal Naftali, Shay Ben-Shachar

PDF

Open Access

TL;DR

PANGEN is a web platform that helps compare and create diagnostic gene panels by collecting gene-phenotype associations from multiple sources.

Contribution

PANGEN introduces a centralized platform for comparing and generating diagnostic gene panels using curated gene-phenotype associations and genomic data.

Findings

01

PANGEN collected data from 12 sources, including 1533 panels, 9759 phenotypes, and 6979 genes.

02

The platform allows users to generate new panels and stratify them into core and extended genes based on phenotype associations.

03

Pairwise panel similarity is calculated to facilitate comparison between gene panels.

Abstract

Targeted gene panel sequencing is used to limit the search for causative genetic variants solely to genes with an established association with the phenotype. The design of gene panels is challenging due to the lack of consensus regarding phenotypic associations for some genes, which results in high variation in gene composition for the same panel offered by different laboratories. We developed PANGEN, a platform that provides a centralized resource for gene panel information, with the ability to compare and generate new intelligent diagnostic panels. Gene–phenotype associations were collected from 12 public and commercial sources (Blueprint, Cegat, Centogene, ClinGen, Fulgent, GeneDx, Health in Code, Human Phenotype Ontology, Invitae, PanelApp, Prevention genetics, and Pronto diagnostics). Gene–phenotype associations are categorized into tiers according to categories derived from the…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures2

Click any figure to enlarge with its caption.

PANGEN facilitates targeted gene panel generation by allowing users to upload gene lists and interactively explore available panels by phenotype, identify similar panels, compare gene content, and generate personalized gene panels.

The online platform has several tabs the user select to perform various tasks. (A) Panel browser—search for specific gene panels of interest. (B) Panel comparison—compare a selected panel with other similar panels and generate a new panel based on user-defined combinations. (C) Panel combination—generate a two-tier panel based on a combination of selected ‘core’ and ‘extended’ panels, and (D) Gene list upload—upload a gene list that can then be compared and combined with other panels.

Equations1

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}

S\left( {A,B} \right) = \frac{{\mid A\mathop \cap \nolimits^ B\mid }}{{\mid A\mathop \cup \nolimits^ B|}}

\end{document}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Genomics and Rare Diseases · Biomedical Text Mining and Ontologies

Full text

Background

Targeted gene panel sequencing is used to limit the search for causative genetic variants to genes associated with specific phenotypes, thereby maximizing target coverage and avoiding incidental findings [1, 2]. Targeted panels may be defined prior to sequencing, during library preparation steps, or as virtual gene panels based on whole exome/genome sequencing panels (WES/WGS). Such virtual panels are becoming increasingly common, as they enable dynamic panel design considering various degrees of gene–phenotype association [2]. The process of gene panel design presents a challenge due to the lack of consensus across the scientific community and commercial sources regarding gene–phenotype associations. Collaborative tools, such as PanelApp, facilitate gene panel design based on expert review and include mostly genes with a strong association to the phenotype [3]. While this approach controls the amount of identified variants of unknown significance (VUS), it limits the sensitivity of the panel to identify causative variants in genes with weaker phenotypic association. Consequently, there is considerable variability in the composition of gene panels that address the same clinical condition, making it challenging to determine the most appropriate panel for a given case. This variability can lead to inefficiencies in genetic testing and potentially missed diagnoses, thereby underscoring the need for a standardized and centralized resource. A comprehensive assessment of the spectrum of available panels for a given phenotype facilitates the inclusion of genes that could otherwise be missed while considering the origin of each gene in the panel. Careful virtual gene panel design is critical since these panels have been shown to result in an increase in reported VUS compared to WES, due in part to differences in reporting practices, and therefore consideration should be taken when deciding on their gene content [4]. This necessitates performing several labour-intensive tasks, such as retrieving gene content from various providers for relevant panels, consolidating and differentiating these panels, and prioritizing genes based on their source. In light of these challenges, we have developed PANGEN (targeted gene PANel GENerator), a unique platform that represents a unified, comprehensive resource for gene panel information. Our platform not only aggregates data from a diverse set of public and commercial sources but also offers advanced tools for panel comparison and customization.

Methods

Data collection and compilation

An extensive data collection effort was conducted by gathering gene–phenotype associations from a diverse set of 11 public and commercial sources. These sources include: Blueprint Genetics (https://blueprintgenetics.com/), Cegat (https://cegat.com/), Centogene (https://www.centogene.com/), ClinGen (https://clinicalgenome.org/), Fulgent Genetics (https://www.fulgentgenetics.com/), GeneDx (https://www.genedx.com/), Health in Code (https://healthincode.com/en/), Invitae (https://www.invitae.com/), PanelApp (https://panelapp.genomicsengland.co.uk/), Prevention genetics (https://www.preventiongenetics.com/), and Pronto diagnostics (https://www.prontodiagnostics.com/). Additionally, we incorporated data from the Human Phenotype Ontology [5] to enrich the phenotypic information associated with genes. Data were obtained exclusively from publicly available, online resources. Gene–phenotype association strength was categorized into tiers. Tiers were retrieved, where available, from each panel’s source specifications. Overall, three sources use three tiers to categorize gene–phenotype association (ClinGen, Health in Code, and PanelApp), one source uses two tiers (Invitae), and the rest use only a single list of genes per panel.

Panel similarity calculation

To provide users with the capability to compare gene panels effectively, pairwise panel similarity was calculated for all possible panel pairs. Similarity calculation is based on a metric that takes into account the overlap of genes between two panels relative to the total number of genes in both panels. Mathematically, the similarity score (S) between two panels A and B is computed as follows:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $$S\left( {A,B} \right) = \frac{{\mid A\mathop \cap \nolimits^ B\mid }}{{\mid A\mathop \cup \nolimits^ B|}}$$\end{document}

Here, |A ∩ B| represents the number of genes that are present in both panel A and panel B, and |A ∪ B| represents the total number of unique genes in both panels.

GC content and genomic duplication data

To enhance the utility of the platform, we incorporate information corresponding to the sequencing complexity of each gene: specifically, we annotate gene regions with extreme guanine-cytosine (GC) content and putative genomic duplications. Genes in the designed panel are reviewed and those containing regions that are difficult to sequence and/or analyse are marked as such. The user may use this information and decide whether to issue a warning for panel genes that are difficult to analyse, recommend orthogonal validation, or even decide that given the expected complexity, it is not feasible to run the panel. GC content information was retrieved from the Genome in a Bottle stratifications dataset [6]. Putative genomic duplications [7] were retrieved from the University of Santa Cruz table browser data retrieval tool [8].

Implementation

The platform was developed using R statistical software (version 4.0.2; R Foundation for Statistical Computing, Vienna, Austria) and the shiny R package [9].

Functional features

We have collected 6979 genes corresponding to 1533 panels and 9759 phenotypes. The platform was designed as an adaptable online interface and modified according to the feedback provided by our team of variant scientists. The platform offers various features for panel exploration and manipulation (Fig. 1): (i) Panel browser tab—users can search for specific gene panels of interest or browse through panels associated with particular clinical conditions or phenotypes or search for panels containing specific genes (Fig. 2A). (ii) Panel comparison tab—that compares a selected panel with all other collected panels (Fig. 2B). Panels can then be sorted by similarity, facilitating the addition of genes from similar panels and comparison with expert-opinion based panels (i.e. PanelApp). In this tab, the user can compose a new gene panel by adding genes from various sources. Panel similarity is a critical feature of our platform, allowing users to identify shared genes and commonalities between different panels. This information aids in the selection and customization of panels for specific research or clinical applications. (iii) Panel validation tabs—users can access duplication and GC content data on the platform. Genomic duplications and extreme GC content can complicate sequence alignment, leading to ambiguity in variant calling. To address this concern, we retrieve and present information on such putative genomic regions. These data aid users in assessing their gene panels and potential technical pitfalls in the downstream analysis. (iv) Panel combination tab, in which the user can select gene panels to combine (Fig. 2C). The platform allows users to stratify a generated panel into two categories: genes with strong phenotype associations (‘core’) and genes with weaker associations (‘extended’). This stratification helps clinicians and researchers to prioritize genes based on their perceived relevance to the phenotype under investigation. (v) Gene list upload tab, in which the user may upload a gene list that can then be compared and combined with other collected panels (Fig. 2D).

PANGEN facilitates targeted gene panel generation by allowing users to upload gene lists and interactively explore available panels by phenotype, identify similar panels, compare gene content, and generate personalized gene panels.

The online platform has several tabs the user select to perform various tasks. (A) Panel browser—search for specific gene panels of interest. (B) Panel comparison—compare a selected panel with other similar panels and generate a new panel based on user-defined combinations. (C) Panel combination—generate a two-tier panel based on a combination of selected ‘core’ and ‘extended’ panels, and (D) Gene list upload—upload a gene list that can then be compared and combined with other panels.

Implementation

We utilized the platform and generated a set of 43 gene panels using the following procedure: (i) A PanelApp panel is selected according to a prespecified phenotype. (ii) All the genes in the PanelApp panel with either a ‘green’ or ‘amber’ tiers were assigned as ‘core’ panel genes. (iii) We then collect all related panels from other sources. Selection was based either on the name of the panels or their calculated similarity with the selected PanelApp panel. Every gene found in any of the other similar panels was assigned as ‘extended’ panel gene. These panels were uploaded to the platform and are currently available for comparison and review. Subsequently, during variant analysis, variants may be stratified and reported according to whether they affect ‘Core’ or ‘Extended’ genes by setting a lower reporting threshold for variants affecting ‘Core’ genes, as they are more likely to be associated with the phenotype in question. The compiled gene panels may be used for both targeted analysis and to narrow down the search space to the most phenotypically relevant genes in broader tests, such as WES/WGS.

The platform represents a unique, unified resource for gene panel exploration, comparison, and design that facilitates the combination of genes from different panels and from various sources to create panels tailored to specific research or clinical needs.

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bean LJH , Funke B, Carlston CM et al. Diagnostic gene sequencing panels: from design to report-a technical standard of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2020;22:453–61.doi: 10.1038/s 41436-019-0666-z 31732716 · doi ↗ · pubmed ↗
2Molina-Ramírez LP , Kyle C, Ellingford JM et al. Personalised virtual gene panels reduce interpretation workload and maintain diagnostic rates of proband-only clinical exome sequencing for rare disorders. J Med Genet 2022;59:393–98.doi: 10.1136/jmedgenet-2020-10730333879512 PMC 8961756 · doi ↗ · pubmed ↗
3Martin AR , Williams E, Foulger RE et al. Panel App crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat Genet 2019;51:1560–65.doi: 10.1038/s 41588-019-0528-231676867 · doi ↗ · pubmed ↗
4Rehm HL , Alaimo JT, Aradhya S et al. The landscape of reported VUS in multi-gene panel and genomic testing: time for a change. Genet Med 2023;25:100947.doi: 10.1016/j.gim.2023.100947 PMC 1082506137534744 · doi ↗ · pubmed ↗
5Robinson PN , Mundlos S. The human phenotype ontology. Clin Genet 2010;77:525–34.doi: 10.1111/j.1399-0004.2010.01436.x 20412080 · doi ↗ · pubmed ↗
6Olson ND , Wagner J, Mc Daniel J et al. Precision FDA Truth Challenge V 2: calling variants from short and long reads in difficult-to-map regions. Cell Genomics 2022;2:100129.doi: 10.1016/j.xgen.2022.100129 PMC 920542735720974 · doi ↗ · pubmed ↗
7Bailey JA , Gu Z, Clark RA et al. Recent segmental duplications in the human genome. Science 2002;297:1003–07.doi: 10.1126/science.107204712169732 · doi ↗ · pubmed ↗
8Karolchik D , Hinrichs AS, Furey TS et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res 2004;32:D 493–496.doi: 10.1093/nar/gkh 10314681465 PMC 308837 · doi ↗ · pubmed ↗