Developing Del2Phen: A Novel Phenotype Description Tool for Chromosome Deletions

Eleana Rraku; Tyler D. Medina; Conny M. A. van Ravenswaaij-Arts; Mariska K. Slofstra; Morris A. Swertz; Trijnie Dijkhuizen; Lennart F. Johansson; Aafke Engwerda

PMC · DOI:10.1155/humu/5239482·January 2, 2026

Developing Del2Phen: A Novel Phenotype Description Tool for Chromosome Deletions

Eleana Rraku, Tyler D. Medina, Conny M. A. van Ravenswaaij-Arts, Mariska K. Slofstra, Morris A. Swertz, Trijnie Dijkhuizen, Lennart F. Johansson, Aafke Engwerda

PDF

Open Access

TL;DR

Del2Phen is a new tool that helps describe health effects of chromosome 6 deletions by comparing genetic data from patients and literature.

Contribution

Del2Phen is a novel software tool that generates phenotype descriptions for chromosome deletions using genotypic similarity and haploinsufficiency data.

Findings

01

Del2Phen was developed using data from over 500 individuals with chromosome 6 deletions.

02

The tool uses haploinsufficiency and gene-phenotype relationships to determine genotypic similarity.

03

Del2Phen can be adapted for other chromosomes and duplications with sufficient data.

Abstract

Information on the health‐related consequences of rare chromosome disorders is often limited, posing challenges for both patients and their families. The Chromosome 6 Project aims to bridge this knowledge gap for structural aberrations involving chromosome 6 by providing parents of affected children with information on the expected phenotypes of their child. To achieve this, detailed phenotype and genotype data are collected directly from parents worldwide and supplemented with data from literature reports, resulting thus far in a dataset of over 500 individuals. This comprehensive data pool was used to develop Del2Phen, a software tool introduced in this paper that generates aberration‐specific phenotype information for chromosome disorders. Del2Phen identifies individuals with a deletion or duplication similar to that of a new patient (index) and produces a clinical description for…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases2

Chromosome Deletions chromosome disorders

Figures2

Click any figure to enlarge with its caption.

Schematic representation of Del2Phen. The tool uses existing genotype and phenotype data as input; in this study, these are data stored in the Chromosome 6 database (left panel). Based on preset genotypic similarity parameters, Del2Phen identifies individuals with a similar deletion to that of an index patient (indicated in grey in the middle panel). Deletions are identified as similar based on concordance in dominating‐effect gene content and the maximum achievable HI‐gene similarity that yields groups that exceed a certain size (following the algorithm developed in this study and depicted in Figure 3). Details on the definitions of HI‐ and dominating‐effect genes are provided in Section 2.1. Patients identified by the tool as genotypically similar to the index are grouped together, and their phenotypes are used to give a phenotype description for the index patient (right panel).

Genotypic similarity algorithm for Del2Phen′s generation of phenotype descriptions for chromosome 6 deletions. The tool will leverage the maximum possible functional similarity within groups that exceed a certain size for generating a comprehensive phenotype description. At HI‐gene similarity ≥ 80%, a description will be provided for groups of at least 20 individuals. The group size cut‐off will be lowered to 15 for HI‐gene similarities of at least 70% and 60%. Below these similarity thresholds, phenotype descriptions will only be given if a dominating‐effect (DE) gene is present, coupled with the highest HI‐gene similarity yielding groups of at least 15 individuals. In each step, n represents the number of additional individuals in the Chromosome 6 database who become eligible for a phenotype description as the algorithm progresses through the step, whereas N represents the total number of individuals who would or would not receive a phenotype description based on these settings and our current data pool.

Tables3

Table 1. Characteristics of patients included in the current study.

Cohort	Sex female/male ^a	Median age in years; months (range) ^b	Median deletion size in Mb (range) ^c	No. of deletions containing at least one DE gene (fraction from total no. of deletions) ^c	Median no. of HI genes per deletion (range) ^c	Median no. of present phenotypic features per patient (range) ^d
Parent cohort (n = 142)	78/64	6;3 (0;1–48;5)	6.45 (0.05–28.27)	61 (42%)	4 (0–20)	42 (6–117)
Literature cohort (n = 310)	169/139	8;0 (0;0–66;0)	3.81 (0.005–37.85)	152 (48%)	3 (0–21)	13.5 (0–61)
Total (n = 452)	247/203	7;0 (0;0–66;0)	4.46 (0.005–37.85)	213 (46%)	3 (0–21)	19 (0–117)

Table 2. Effect of increasing minimum HI‐gene similarity threshold on group size and PPV.

Minimum HI-gene similarity threshold	Number of individuals with ≥ 15/≥ 20/≥ 30 connections	Median PPV overall ^a	Median PPV if < 15/≥ 15 connections ^b	Median PPV if < 20/≥ 20 connections ^b	Median PPV if < 30/≥ 30 connections ^b
50%	251/215/156	0.286	0.231/0.313	0.25/0.313	0.267/0.313
60%	205/165/84	0.308	0.265/0.333	0.273/0.333	0.304/0.313
70%	161/108/81	0.333	0.297/0.333	0.333/0.333	0.333/0.333
80%	148/108/81	0.333	0.316/0.333	0.313/0.368	0.333/0.379
90%	103/85/35	0.333	0.3/0.379	0.3/0.4	0.32/0.4
100%	103/85/35	0.333	0.3/0.379	0.3/0.4	0.327/0.4

Table 3. Example of a clinical description generated by Del2Phen for a patient with a dominating‐effect gene in the deletion.

Clinical description for patient Id226	Features present in patient Id226
Intrauterine growth retardation	Intrauterine growth retardation
Hospitalisation after birth	Hospitalisation after birth
	Hospitalisation due to congenital problems
	Premature rupture of membranes
	Premature birth
	Neonatal asphyxia
	Prolonged neonatal jaundice
	Feeding problems in infancy
Abnormality of the outer ear
	Abnormality of the skin
	Hypopigmented skin patches
	Capillary haemangioma
Joint hypermobility	Joint hypermobility
Generalised joint laxity	Generalised joint laxity
Abnormal hand morphology
	Abdominal wall defect
	Umbilical hernia
Sacral dimple	Sacral dimple
Abnormality of body height	Abnormality of body height
Short stature	Short stature
Abnormality of the respiratory system
Abnormal heart morphology	Abnormal heart morphology
Abnormal cardiac septum morphology	Abnormal cardiac septum morphology
Abnormal atrial septum morphology
Abnormal heart valve morphology	Abnormal heart valve morphology
Abnormal aortic valve morphology
Bicuspid aortic valve
Abnormal tricuspid valve morphology	Abnormal tricuspid valve morphology
Abnormal mitral valve morphology
Patent ductus arteriosus
Cardiomyopathy
Dilated cardiomyopathy
Muscular hypotonia	Muscular hypotonia
	Seizure
	Febrile seizure
	Normal hearing at newborn screening
Neurodevelopmental delay
Delayed gross motor development
Delayed speech and language development
	Social behaviour
	Helpful behaviour

Phenotype description assessment
Total number of features in description (only using the selected feature in bold)	25 (10)
Number of features in description seen in patient (only using the selected feature in bold)	12 (6)
PPV based on full description (only using the selected feature in bold)	0.48 (0.6)

Funding4

—Crowdfunding from Chromosome 6 Project
—Nederlandse Organisatie voor Wetenschappelijk Onderzoek10.13039/501100003246
—ZonMw10.13039/501100001826
—Universitair Medisch Centrum Groningen10.13039/501100005075

Keywords

chromosome 6Chromosome 6 Projectchromosome disorderparentsphenotype informationtool

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Rare Diseases · Genomic variations and chromosomal abnormalities · Genetics and Neurodevelopmental Disorders

Full text

1. Introduction

Information on the phenotypic consequences of rare chromosome aberrations, including deletions and duplications, is often scarce. This leads to parental uncertainty about their child′s future, and knowledge about potential health issues has been shown to be crucial for parental coping [1, 2]. Better availability of such information would also help healthcare providers deliver appropriate care to affected individuals.

Studying the clinical manifestations of structural chromosome aberrations is challenging for several reasons. The rareness of these aberrations and the geographic spread of affected individuals mean researchers must primarily rely on data from literature reports and online databases, both of which often provide incomplete information. Additionally, most structural aberrations (unless inherited) are unique in location, size and genetic content, making it difficult to identify genotypically similar individuals to compare and link to a phenotypic outcome. Lastly, determining the clinical effect of a chromosome abnormality is complex even for highly overlapping aberrations since the resulting phenotype is not based simply on the sum of all affected genes.

To address this knowledge gap, the Chromosome 6 Project (http://www.chromosome6.org) was established with a focus on structural aberrations affecting chromosome 6. This parent‐driven research project aims to provide parents of affected individuals with detailed information on the possible clinical consequences of their child′s aberration. The Chromosome 6 Project uses social media to reach and recruit new participants and collects detailed phenotype and genotype information directly from parents of affected children [3]. This information, supplemented with data extracted from literature reports and stored in the secure Chromosome 6 database, has been used to clinically characterise deletions in various chromosome 6 regions [3–6]. Furthermore, a study conducted within the scope of this project has demonstrated that parent‐derived phenotype information is suitable for clinical descriptions of rare chromosome disorders [7].

Building upon a successful participant recruitment and data collection strategy, we can now proceed with the next steps in the project: automating and optimising the way genotype and phenotype data are analysed, so that clinical information is readily available to parents and clinicians. In the present study, we introduce Del2Phen, a software tool that produces tailored phenotype descriptions for chromosome aberrations using the clinical features of genotypically similar individuals. Here, we also evaluate the optimal genotypic similarity parameters for chromosome 6 deletions. Since these are rare and unique aberrations, we ascertained the best way for Del2Phen to group a sufficient number of genotypically similar individuals to produce a reliable phenotype description. Even though Del2Phen was developed using data on chromosome 6 deletions, it can be applied to deletions involving other chromosomes and is easily adaptable for use on duplications.

2. Materials and Methods

Del2Phen is a command‐line Python 3 tool (installable from GitHub [https://github.com/Chromosome-6-Project/Del2Phen] or PyPI [https://pypi.org/project/del2phen]) that uses existing genotype and phenotype data to generate a phenotype description tailored to the deletion of a new patient (index) (Figure 1). The tool identifies individuals with a deletion similar to that of the index and produces a phenotype description based on the clinical features of these individuals. Genotypic similarity is determined based on preset parameters (see Section 2.2). In this study, we applied and optimized Del2Phen for use on chromosome 6 deletions using data in the Chromosome 6 database. We determined the optimal genotypic similarity parameters for chromosome 6 deletions, which produce thorough and accurate clinical descriptions based on sufficiently large groups that share a high degree of genotypic similarity.

Schematic representation of Del2Phen. The tool uses existing genotype and phenotype data as input; in this study, these are data stored in the Chromosome 6 database (left panel). Based on preset genotypic similarity parameters, Del2Phen identifies individuals with a similar deletion to that of an index patient (indicated in grey in the middle panel). Deletions are identified as similar based on concordance in dominating‐effect gene content and the maximum achievable HI‐gene similarity that yields groups that exceed a certain size (following the algorithm developed in this study and depicted in Figure 3). Details on the definitions of HI‐ and dominating‐effect genes are provided in Section 2.1. Patients identified by the tool as genotypically similar to the index are grouped together, and their phenotypes are used to give a phenotype description for the index patient (right panel).

2.1. Del2Phen Input and Annotations

Del2Phen uses existing genotype and (coded) phenotype data as input. For this study, these data are sourced from the Chromosome 6 database (Figure 1). Phenotype data were collected from Chromosome 6 Project participants (parent cohort) and eligible case reports (literature cohort) using the Chromosome 6 Questionnaire (see Supporting Information S1 for further details). This multilingual web‐based questionnaire contains 132 main questions that cover 654 phenotypic features, developmental characteristics and other types of clinical information (Table S1). All phenotypes are coded using Human Phenotype Ontology [8] (n = 423), ICD‐10 [9] (n = 11), ORPHA [10] (n = 1) or custom (n = 219) terms. Custom codes were developed to encode information for which no standard codes were available, such as growth parameters, age at onset of certain conditions or developmental milestones. Genotype data from both cohorts derive from the results of microarray or other high‐resolution genetic analyses, performed in different laboratories using various platforms. Genomic locations based on a reference genome other than hg19 are converted to hg19 using the UCSC LiftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver) [11].

Del2Phen annotates several aspects of each chromosome aberration, including type (deletion vs. duplication), locus (base pair position at the start and end of the aberration in hg19) and genes intersecting the aberration. For each gene, the following haploinsufficiency (HI) scores are annotated: HI score from Huang et al. [12], probability of loss‐of‐function intolerance (pLI) score based on Lek et al. [13] and predicted probability of HI (pHaplo) score from Collins et al. [14]. Each score denotes the probability of a gene having a clinical effect in case of deletions or loss‐of‐function variants. Genes with an HI score between 0% and 10%, a pLI score ≥ 0.9 or a pHaplo score ≥ 0.86 are most likely to have a phenotypic effect [12–14]. For Del2Phen, genes are considered HI genes if they fulfil at least two of these three criteria when all three scores are known or at least one of two criteria when only two scores are known. Selected HI genes known to have a prominent, highly penetrant effect on the phenotype are annotated as dominating‐effect genes in the tool. Details on the selection of these genes and additional technical information are provided in Supporting Information S1.

2.2. Workflow and Genotypic Similarity Settings

Del2Phen performs pairwise comparisons between patients using specific parameters to score their genotypic similarity. For aberrations of the same type on the same chromosome, the Jaccard index is used to measure the size, location and (HI‐)gene overlap between pairs (see example in Figure S1). For each of these parameters, minimum thresholds can be introduced. Dominating‐effect gene content is the only binary parameter, indicating whether or not the sets of affected dominating‐effect genes are equivalent (concordant) between patients. For patients harbouring multiple aberrations of the same type on the same chromosome, Del2Phen uses the total size and gene content of these aberrations for the comparison.

Given the prominent role that dominating‐effect genes play in the clinical picture, Del2Phen′s first genotypic similarity parameter is concordance in the presence or absence of these genes. Only individuals with the same affected dominating‐effect genes are identified as genotypically similar by the tool. Similarly, because of their predicted phenotypic effect, Del2Phen′s second genotypic similarity parameter is the fraction of HI genes shared between the index and each individual concordant for the selected dominating‐effect genes. This parameter, HI‐gene similarity (Figure S1), can range from 0% to 100% depending on the extent of HI‐gene overlap between deletions. Del2Phen uses these two parameters to identify individuals in the Chromosome 6 database as genotypically similar to an index, thereby generating a phenotype description tailored to that index patient (Figure 1).

2.3. Assessing Optimal HI‐Gene Similarity for Chromosome 6 Deletions

Ideally, Del2Phen would only generate a phenotype description based on individuals with 100% HI‐gene overlap, ensuring the highest level of functional similarity. However, this would result in too few genotypically similar cases. To identify the thresholds that would allow as many individuals with a chromosome 6 deletion as possible to receive a phenotype description based on highly similar deletions, we evaluated the impact of various minimum HI‐gene similarity thresholds on the (i) size of the genotypically similar group (i.e., group size or number of connections) and (ii) precision of the resulting phenotype descriptions, quantified using positive predictive value (PPV). We used 115 main features to determine PPV (features in bold in Table S1) to prevent double counting of the same feature if both the main and subfeatures were known (see Supporting Information S1 for additional criteria for the selection of these features).

PPV was calculated individually for all patients in the database who had at least five connections for any given HI‐gene similarity (each patient was used as an index once) using the following formula: PPV = number of features in the description that are also present in index/total number of features in the description, where ‘total number of features in the description’ represents the number of bold features in Table S1 that were present in ≥ 20% of the connected group and at least two individuals. The ‘at least two individuals’ criterion was added to exclude features present in only one patient in case of a minimum number of connections (n = 5). For these calculations, features not reported in literature cases were considered not present.

The objective of our assessments was to determine the extent to which we can decrease the minimum HI‐gene similarity in order to reach higher numbers of genotypically similar cases while still preserving PPV. Given the nonnormal data distribution, we used the median (50th quartile) as a measure of PPV. To ensure that the presence of a dominating‐effect gene does not affect the overall results, this metric was also determined separately for individuals with and without such a gene in their deletion, and any significant differences between these groups were evaluated using a Mann–Whitney U test. As the sample sizes in the literature and parent cohorts were insufficient for separate assessments, we examined differences in the number of phenotypic features (Mann–Whitney U test) and dominating‐effect gene content (chi‐square test) between the two cohorts to address potential confounding. All statistical analyses were performed with Python 3.10 [15] using the SciPy [16], Matplotlib [17], and Plotly packages [18].

Lastly, to demonstrate Del2Phen′s application using the genotypic similarity parameters determined in this study, we used it to generate a phenotype description for two randomly selected patients in the Chromosome 6 database: one with a dominating‐effect gene in their deletion and one without.

3. Results

3.1. Participant and Genotype Characteristics

As of November 2023, the Chromosome 6 database included detailed genotype and phenotype information for 184 project participants who had completed the Chromosome 6 Questionnaire (parent cohort) and 342 individuals described in literature (literature cohort). From these, 452 individuals (142 from the parent cohort and 310 from the literature cohort) were eligible for use by Del2Phen (Table 1 and Figure S2). Their deletions were not equally distributed throughout the chromosome (Figure S3), with the proximal 6p region having a notably low number of cases. The number of phenotypic features per patient was significantly higher in the parent cohort compared with the literature cohort (p < 0.0001) (Table 1), but there was no significant difference between cohorts in the number of cases encompassing at least one dominating‐effect gene (p = 0.28).

3.2. Assessment of Genotypic Similarity Parameters

Of the 121 genes (Table S2) that fulfilled our definition of an HI gene (Section 2.1), five were selected as dominating‐effect genes and used as the tool′s first genotypic similarity parameter: FOXC1 (Forkhead Box C1, MIM∗ 601090, HGNC: 3800), SYNGAP1 (Synaptic Ras‐GTPase‐Activating Protein 1, MIM∗ 603384, HGNC: 11497), TAB2 (TAK1‐Binding Protein 2, MIM∗ 605101, HGNC: 17075), ARID1B (AT‐rich Interaction Domain‐Containing Protein 1B, MIM∗ 614556, HGNC: 18040) and DLL1 (Delta‐Like Canonical Notch Ligand 1, MIM∗ 606582, HGNC: 2908). Table S3 gives an overview of the phenotypes associated with HI of these genes.

As expected, the number of individuals who exceed a certain number of connections (5, 15, 20 or 30) decreases as we increase the minimum HI‐gene similarity threshold (Table 2 and Figure S4). Conversely, the overall median PPV, calculated across all eligible patients, shows a positive trend with HI‐gene overlap, peaking at 0.333 when the minimum HI‐gene similarity is set to 70% or higher (Table 2). Interestingly, the increase in median PPV with HI‐gene similarity is only marginal in groups containing a dominating‐effect gene (Figure 2a), with PPV being significantly higher in these groups at lower HI‐gene similarity thresholds (≤ 50%) when compared with groups without such a gene (Table S4). This difference is no longer statistically significant when the minimum HI‐gene similarity is at least 60%.

Figure 2. Median PPV based on dominating‐effect (DE) gene content (a) and minimum group size (b) for various minimum HI‐gene similarity thresholds. (a) Median PPV for individuals harbouring a DE gene in their deletion exhibits only slight variation in response to changes in the minimum HI‐gene similarity threshold. In contrast, for individuals who do not lack such a gene, median PPV notably increases when the minimum HI‐gene similarity threshold surpasses 50%. (b) When comparing groups of different sizes, larger groups (≥ 15 or ≥ 20 individuals) tend to exhibit higher median PPVs, particularly at higher HI‐gene similarities (≥ 80%). At these similarity thresholds, increasing the minimum group size from 20 to 30 does not affect PPV. For HI‐gene similarity thresholds of ≥ 60% and ≥ 70%, groups of ≥ 15 and ≥ 20 individuals exhibit the same median PPV (0.333).(a)(b)

When evaluating groups based on different minimum sizes, the highest median PPV (0.4) occurs in individuals with ≥ 20 connections at HI‐gene similarity thresholds of ≥ 90% and 100%, and increasing the group size to ≥ 30 does not further improve this metric (Table 2 and Figure 2b). When the minimum HI‐gene similarity is < 80%, the differences between groups of different sizes become less pronounced. At ≥ 60% and ≥ 70% HI‐gene similarity, groups containing ≥ 15 individuals and those with ≥ 20 exhibit a similar median PPV (0.333) (Figure 2b), with a minimum group size of 15 individuals sufficient to achieve a statistically significant increase in PPV compared with smaller groups at these thresholds (p values provided in Table S5).

3.3. Del2Phen′s Default Genotypic Similarity Settings for Chromosome 6 Deletions

Considering the results above, we determined the genotypic similarity algorithm depicted in Figure 3 to guide Del2Phen′s generation of phenotype descriptions for chromosome 6 deletions. Under the default settings, at HI‐gene similarity ≥ 80%, Del2Phen will provide a clinical description with the highest achievable functional similarity for groups of ≥ 20 individuals. At 60% and 70% HI‐gene similarity, the group size cut‐off will be reduced to 15. Below these similarity thresholds, phenotype descriptions will only be generated if a dominating‐effect gene is present, coupled with the highest HI‐gene similarity that would yield groups of at least 15. The rationale behind this algorithm is discussed in Section 4.3.

Genotypic similarity algorithm for Del2Phen′s generation of phenotype descriptions for chromosome 6 deletions. The tool will leverage the maximum possible functional similarity within groups that exceed a certain size for generating a comprehensive phenotype description. At HI‐gene similarity ≥ 80%, a description will be provided for groups of at least 20 individuals. The group size cut‐off will be lowered to 15 for HI‐gene similarities of at least 70% and 60%. Below these similarity thresholds, phenotype descriptions will only be given if a dominating‐effect (DE) gene is present, coupled with the highest HI‐gene similarity yielding groups of at least 15 individuals. In each step, n represents the number of additional individuals in the Chromosome 6 database who become eligible for a phenotype description as the algorithm progresses through the step, whereas N represents the total number of individuals who would or would not receive a phenotype description based on these settings and our current data pool.

Tables 3 and S6 show the phenotype descriptions for two patients from the Chromosome 6 database generated by Del2Phen based on this algorithm. The first patient (individual Id226) has a 2.4‐Mb 6q24.3q25.1 deletion that includes the dominating‐effect gene TAB2. Del2Phen identified 17 individuals in the Chromosome 6 database (two Chromosome 6 Project participants and 15 literature cases) who shared the same dominating‐effect gene and at least 70% of their HI‐gene content with this patient. The phenotype description for this patient (Table 3) consists of 25 features present in at least 20% of the connected group. Of these, 12 features were also seen in the index, leading to a PPV of 0.48 based on the full description and a PPV of 0.6 when considering only the selected features used to determine the tool′s setting (Table S1).

The second phenotype description (Table S6) was generated for a patient with an 11.1‐Mb deletion in the 6q13q14.1 region that does not span a dominating‐effect gene. The description is based on the phenotypes of 16 individuals (five project participants and 11 literature cases) with a minimum of 70% overlap in their HI‐gene content. The clinical description for this patient was lengthier, encompassing 38 features, and resulted in a PPV of 0.45, rising to 0.5 when using only the selected features in Table S1.

4. Discussion

We have introduced Del2Phen, an innovative tool developed to provide tailored phenotype descriptions for chromosome aberrations. In this study, Del2Phen was applied to deletions on chromosome 6 based on the data collected through the Chromosome 6 Project. We have described the process through which Del2Phen′s genotypic similarity parameters were evaluated, with the aim of determining its definitive settings for chromosome 6 deletions. Below, we discuss this optimisation process, the rationale behind Del2Phen′s genotypic similarity algorithm and the future steps for this tool to be implemented in practice.

4.1. Phenotype Descriptions for Chromosome Deletions

Defining the functional similarity of chromosome aberrations is complex because of our still incomplete understanding of the function of many genes and the clinical consequences in the case of deletions. Although websites aimed at professionals, such as DECIPHER [19], often allow users to set the functional similarity range, Del2Phen will eventually be incorporated into a website primarily for parents. It is therefore important to preset the genotypic similarity parameters that will be used to generate phenotype descriptions. For Del2Phen, we utilised current knowledge of the predicted HI effect of genes and known gene–phenotype associations to define genotypic similarity.

Because of the dominating role of the genes in Table S3 in the clinical picture, we expect individuals harbouring identical sets of these genes to have similar phenotypes. The observed higher PPV in groups harbouring a dominating‐effect gene even at otherwise low HI‐gene similarity (Figure 2a and Table S4) confirms the significant role of these genes in the clinical outcome and supports our decision to use concordance in dominating‐effect gene content as Del2Phen′s first genotypic similarity parameter. This is also illustrated by the example in Table 3, where all the features associated with a deletion of the dominating‐effect gene TAB2 (Table S3) were included in the phenotype description, and most were present in the index.

The second parameter, the HI‐gene overlap between deletions, would ideally be set to 100%. However, the limited number of individuals who have fully overlapping deletions means that such high similarity would often result in phenotype descriptions based on small patient groups, if generated at all. Given the role of individual variation in the resulting phenotype and the incomplete penetrance of most observed features [20], it was necessary to lower the HI‐gene similarity cut‐off to enable a larger number of individuals to receive a phenotype description and increase the sample size on which these descriptions would be based. Concurrently, it was important to ensure that adjusting the HI‐gene similarity would not compromise the reliability of Del2Phen′s outcome.

4.2. Evaluating Phenotype Descriptions

Assessing the accuracy of phenotype descriptions in the context of structural chromosome aberrations is inherently complex, as even identical deletions can result in an array of phenotypic manifestations [21, 22]. To gain insights into the relevance of features in the descriptions, we used PPV as an outcome measure to describe the fraction of features in the clinical description that were in accordance with the phenotype of the index. Remarkably, PPV did not drop below 0.2 when setting the HI‐gene similarity threshold to 0% (Figure 2). Although this may seem counterintuitive, these thresholds only indicate a minimum level of similarity; for example, at a threshold of 10%, groups are formed of individuals who share 10% or more of their deletion′s HI‐gene content. Increasing the minimum threshold results in individuals with lower levels of similarity being excluded from the groups, only retaining those sharing a greater fraction of HI genes. This, in turn, contributes to a higher description PPV as the minimum HI‐gene similarity increases, reaching a median of 0.333 for functionally identical deletions (Figure 2).

A maximum median PPV of 0.333 may seem low, but we have to consider the low a priori chance of multiple features occurring together in a single patient (index) given the reduced penetrance of many of the observed features and the variable expressivity associated with these aberrations [20, 21]. For instance, if a characteristic is present in 25% of the genotypically similar group, it will be included in the phenotype description under our current settings but may not be present in the index patient, thus decreasing PPV. We believe that it is correct to include these features in the description to ensure comprehensiveness and not miss important findings, even though this might lower the tool′s precision metrics. Furthermore, the decrease in group size at such high HI‐gene similarity (Figure S4) can also influence PPV.

Another potential factor is the variability in phenotypes associated with deletions in certain chromosome regions. Although deletions in certain areas are linked to highly distinct phenotypes [5], deletions in other regions produce a much more diverse clinical outcome [6], thereby affecting the overall PPV. However, as Del2Phen′s similarity settings will be applied uniformly across the entire chromosome, PPV was determined for deletions in all regions.

Lastly, our relatively modest PPV might also be influenced by the amount and diversity of phenotypic information available for our parent cohort. In contrast to literature reports, in which authors are often restricted in the amount and kind of information they present (e.g., presenting only clinically significant issues), data collected via the Chromosome 6 Questionnaire yield much more detailed information (Table 1) [7]. Since almost all groups consist of both literature cases and Chromosome 6 Project participants, one subset of the group will have fewer but very specific characteristics (literature cohort), whereas the rest will present with a much more diverse phenotype (parent cohort), thus lowering PPV. Unfortunately, it was not possible to conduct the analysis separately for the two cohorts. Nevertheless, we can conclude that the extensive information collected from our parent cohort makes the phenotype descriptions generated by Del2Phen more complete than what can be derived solely from literature reviews.

4.3. Determining Del2Phen′s Genotypic Similarity Settings for Chromosome 6 Deletions

Following Del2Phen′s genotypic similarity algorithm (Figure 3), a phenotype description will be generated based on groups of ≥ 20 individuals sharing the highest attainable HI‐gene fraction in the range of 80%–100%. This minimum group size threshold is justified by our observations that PPV is higher in groups of ≥ 20, with no notable improvements when increasing group size to ≥ 30 (Table 2 and Figure 2b). At the minimum HI‐gene similarity thresholds of 60% and 70%, the median PPV remained constant when lowering the group size to ≥ 15 individuals (Table 2), while still being significantly higher compared with groups of < 15 (Table S5). For this reason, the group size cut‐off was lowered to at least 15 at these similarity thresholds, allowing Del2Phen to provide phenotype descriptions for a larger fraction of individuals in the database, as well as future new patients (Figure S4).

If an insufficient number of patients in the database share ≥ 60% of their HI genes, Del2Phen will still provide a description for deletions that include a dominating‐effect gene, based on the highest possible HI‐gene overlap that would yield groups of ≥ 15 (Figure 3). This setting is based on the expectation that the phenotype in these groups would be primarily determined by the deletion of the dominating‐effect gene rather than by the HI‐gene similarity, as supported by our findings in Figure 2a and Table S4. At HI‐gene similarity ≥ 60%, the difference in PPV between groups with and without a dominating‐effect gene is no longer significant, suggesting that this fraction of shared HI genes is sufficient to produce effects similar in impact to that of a dominating‐effect gene.

Based on these settings and our current data pool, 250 individuals (55% of our current cohort) will be eligible to receive an automatically generated phenotype description (Figure 3). As the Chromosome 6 database continues to grow, the number of individuals eligible for a phenotype prediction will also increase.

4.4. Strengths and Limitations

Del2Phen is an innovative phenotype description tool, developed primarily for delivering clinical information to parents. Unlike existing online tools, which are tailored to professionals and require a certain level of expertise, Del2Phen will deliver information via a parent‐friendly website. Naturally, it can also be a valuable resource for healthcare professionals in counselling patients and their families. The clinical descriptions provided will be detailed and thorough compared with the limited information in current databases or literature reports, because Del2Phen uses the extensive well‐curated data collected via the Chromosome 6 Project (Table 1) [7]. Even though it is currently optimised for chromosome 6, it can also be applied to other chromosomes if given the correct input and could thus benefit people affected by a wider spectrum of chromosome disorders.

Since Del2Phen will be periodically updated with the latest information on HI or the clinical effect of genes, the clinical information it produces will be more up to date than that in scientific articles, which reflect knowledge at the time of writing. Although Del2Phen does not specifically account for noncoding features, which are often positioned on fixed locations on the genome, their effects will be incorporated once more evidence is available. Lastly, this tool can also be applied in research settings to expedite data analysis.

One limitation of Del2Phen pertains to the fact that its output can be sensitive to variability in resolution of the genotype data, which derives mostly from microarray analysis conducted using different platforms. We do not expect this to affect the outcome of this study, as only individuals for whom a high‐resolution array was available were included. Furthermore, an HI gene located between the last absent and first present probes (or vice versa) was an exclusion criterion for the current study as it made the deletion of that gene unclear (Supporting Information S1), thus ensuring that the grouping of genotypically similar individuals would not be affected. However, as most new cases are diagnosed with higher resolution techniques, we expect this issue to become increasingly rare in the future.

Clinical data missing from literature reports might have an impact on the thoroughness of the phenotype descriptions. To avoid falsely increasing the prevalence of certain features, we considered features that were not mentioned in the literature reports as not present in the patients. We expect this assumption to hold true for highly clinically relevant features, but articles that focus on specific topics, like heart or brain defects, may sometimes omit unrelated phenotypes.

Finally, it is important to note that we must always be cautious when presenting clinical information on chromosome aberrations, whether derived from automatic or manual analysis. The unique nature of these aberrations, paired with the phenotypic variability, means that not all features will be seen in all patients. Such disclaimers will be added to our website, together with instructions for reading the clinical descriptions, links to websites with support information and the option to download the medical information for discussion with healthcare professionals.

4.5. Future Steps

The next step for Del2Phen is validating its output by comparing its outcomes to those from manual data analysis, the conventional approach for characterising the phenotypes of chromosome deletions. Following this, the tool will be deployed to describe the phenotypes of new, uncharacterised regions. In parallel, the development of the interactive website for parents will also commence. In a running study, with the help of involved parents and professionals, we are investigating the best way to report the clinical features on the website. The study′s findings will inform the website′s content and structure, including the minimum prevalence of clinical features in individuals with a similar aberration required for inclusion in the description. Del2Phen will also undergo periodic updates incorporating the latest knowledge on HI, gene–disease associations and genotype–phenotype associations of CNVs. Lastly, although Del2Phen currently focuses solely on deletions, it will be modified for use on duplications once enough data on these aberrations are collected.

Ultimately, our goal is for this tool to be applied not only to structural aberrations on chromosome 6 but eventually across the entire genome. To achieve this, sufficient genotype and phenotype data must be collected for aberrations on other chromosomes. The Chromosome 6 Project has already demonstrated effective strategies for collecting these data, which can serve as a model for expanding the scope of Del2Phen.

5. Conclusion

We have introduced Del2Phen, a novel phenotype description tool for chromosome aberrations. Descriptions are based on features present in individuals identified as genotypically similar by the tool. Genotypic similarity parameters were determined by leveraging existing knowledge on gene–phenotype relationships and the HI effect of genes. Furthermore, we described the optimisation process for this tool to produce thorough clinical descriptions for the full range of chromosome 6 deletions using data collected through the Chromosome 6 Project. Del2Phen will be incorporated into a website aimed at parents of affected children, providing them with essential health information in an efficient and timely manner and aiding clinical care. Additionally, this tool can be applied to deletions on other chromosomes and can be easily adapted for duplications when sufficient data are available, thus expediting data analysis for any rare chromosome disorder.

Ethics Statement

A complete ethical evaluation was waived by the accredited Medical Ethics Review Committee of the University Medical Center Groningen. According to Dutch guidelines, research conducted anonymously with existing data without performing additional investigations does not fall under scientific research as described in the Medical Research Involving Human Subjects Act (WMO) and therefore does not require ethical approval.

Informed consent for participation in the study and publication of the results was obtained from all participants as part of signing up for the Chromosome 6 Project. The project′s privacy statement is available on the website: http://chromosome6.org/privacy-statement.

Disclosure

A preprint version of this article has previously been published [23]. The final version of this manuscript has been approved by all authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Conceptualisation: C.M.A.R‐A. and A.E. Methodology: C.M.A.R‐A., L.F.J., E.R., T.D.M. and A.E. Software: T.D.M., M.K.S. and M.A.S. Data curation: E.R. and T.D. Formal analysis: T.D.M. and E.R. Writing—original draft: E.R. Writing—review and editing: T.D.M., C.M.A.R‐A., M.K.S., L.F.J., A.E., T.D. and M.A.S. Visualisation: E.R. and T.D.M. Supervision: C.M.A.R‐A., L.F.J. and A.E. Funding acquisition: E.R., A.E., C.M.A.R‐A. and M.A.S. L.F.J. and A.E. contributed equally to this work.

Funding

This study was supported by the Universitair Medisch Centrum Groningen (10.13039/501100005075), ZonMw (10.13039/501100001826, 113312101), Nederlandse Organisatie voor Wetenschappelijk Onderzoek (10.13039/501100003246, 917.164.455) and Crowdfunding from Chromosome 6 Project.

Supporting Information

Additional supporting information can be found online in the Supporting Information section.

Supporting information

Supporting Information 1 Supplementary Methods. Figure S1: Determining the HI‐gene overlap between deletions. Figure S2: Inclusion flowchart of cases for the current study. Figure S3: Distribution of deletions along Chromosome 6. Figure S4: Number of individuals with more than n connections for a minimum HI‐gene similarity threshold. Table S3: Del2Phen′s current dominating‐effect genes. Table S4: Results of Mann–Whitney U test for the difference in PPV in patients with and without a dominating‐effect gene in their deletion for various HI‐gene similarity thresholds. Table S5: p values derived from Mann–Whitney U test for differences in PPV in groups that exceed different sizes. Table S6: Example of a clinical description generated by Del2Phen for a patient without a dominating‐effect gene in the deletion.

Supporting Information 2 Table S1: Phenotypic features and other clinical characteristics covered by the Chromosome 6 Questionnaire.

Supporting Information 3 Table S2: Genes on Chromosome 6 that fulfil our definition of HI and are considered HI genes by Del2Phen.

Supporting Information 4 README: Del2Phen.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Pelentsov L. J. , Laws T. A. , and Esterman A. J. , The Supportive Care Needs of Parents Caring for a Child With a Rare Disease: A Scoping Review, Disability and Health Journal. (2015) 8, no. 4, 475–491, 10.1016/j.dhjo.2015.03.009, 2-s 2.0-84943815861, 25959710.25959710 · doi ↗ · pubmed ↗
2Smits R. M. , Vissers E. , te Pas R. , Roebbers N. , Feitz W. F. J. , van Rooij I. A. L. M. , Blaauw I.de, and Verhaak C. M. , Common Needs in Uncommon Conditions: A Qualitative Study to Explore the Need for Care in Pediatric Patients With Rare Diseases, Orphanet Journal of Rare Diseases. (2022) 17, no. 1, 10.1186/s 13023-022-02305-w, 35379257.PMC 898167535379257 · doi ↗ · pubmed ↗
3Engwerda A. , Frentz B. , den Ouden A. L. , Flapper B. C. T. , Swertz M. A. , Gerkes E. H. , Plantinga M. , Dijkhuizen T. , and van Ravenswaaij-Arts C. M. A. , The Phenotypic Spectrum of Proximal 6q Deletions Based on a Large Cohort Derived from Social Media and Literature Reports, European Journal of Human Genetics. (2018) 26, no. 10, 1478–1489, 10.1038/s 41431-018-0172-9, 2-s 2.0-85048584621, 29904178.29904178 PMC 6138703 · doi ↗ · pubmed ↗
4Rraku E. , Kerstjens-Frederikse W. S. , Swertz M. A. , Dijkhuizen T. , van Ravenswaaij-Arts C. M. A. , and Engwerda A. , The Phenotypic Spectrum of Terminal and Subterminal 6p Deletions Based on a Social Media-Derived Cohort and Literature Review, Orphanet Journal of Rare Diseases. (2023) 18, no. 1, 10.1186/s 13023-023-02670-0, 36964621.PMC 1003951936964621 · doi ↗ · pubmed ↗
5Engwerda A. , Kerstjens-Frederikse W. S. , Corsten-Janssen N. , Dijkhuizen T. , and van Ravenswaaij-Arts C. M. A. , The Phenotypic Spectrum of Terminal 6q Deletions Based on a Large Cohort Derived From Social Media and Literature: A Prominent Role for DLL 1, Orphanet Journal of Rare Diseases. (2023) 18, no. 1, 10.1186/s 13023-023-02658-w, 36935482.PMC 1002485136935482 · doi ↗ · pubmed ↗
6Engwerda A. , Leenders E. K. S. M. , Frentz B. , Terhal P. A. , Löhner K. , de Vries B. B. A. , Dijkhuizen T. , Vos Y. J. , Rinne T. , van den Berg M. P. , Roofthooft M. T. R. , Deelen P. , van Ravenswaaij-Arts C. M. A. , and Kerstjens-Frederikse W. S. , TAB 2 Deletions and Variants Cause a Highly Recognisable Syndrome With Mitral Valve Disease, Cardiomyopathy, Short Stature and Hypermobility, European Journal of Human Genetics. (2021) 29, no. 11, 1669–1676, 10.1038/s 414 · doi ↗ · pubmed ↗
7Engwerda A. , Frentz B. , Rraku E. , de Souza N. F. S. , Swertz M. A. , Plantinga M. , Kerstjens-Frederikse W. S. , Ranchor A. V. , and van Ravenswaaij-Arts C. M. A. , Parent-Reported Phenotype Data on Chromosome 6 Aberrations Collected via an Online Questionnaire: Data Consistency and Data Availability, Orphanet Journal of Rare Diseases. (2023) 18, no. 1, 10.1186/s 13023-023-02657-x, 36935495.PMC 1002483036935495 · doi ↗ · pubmed ↗
8Köhler S. , Gargano M. , Matentzoglu N. , Carmody L. C. , Lewis-Smith D. , Vasilevsky N. A. , Danis D. , Balagura G. , Baynam G. , Brower A. M. , Callahan T. J. , Chute C. G. , Est J. L. , Galer P. D. , Ganesan S. , Griese M. , Haimel M. , Pazmandi J. , Hanauer M. , Harris N. L. , Hartnett M. J. , Hastreiter M. , Hauck F. , He Y. , Jeske T. , Kearney H. , Kindle G. , Klein C. , Knoflach K. , Krause R. , Lagorce D. , Mc Murry J. A. , · doi ↗ · pubmed ↗