Large-scale mouse mutagenesis identifies novel genes affecting vertebral anatomy
Ximena Ibarra-Soria, Elizabeth Webb, John F. Mulley

TL;DR
This study identifies 204 new genes in mice that affect vertebral anatomy, offering insights into potential causes of spinal malformations in humans.
Contribution
The study presents a large-scale analysis of mouse genes revealing novel associations with vertebral anatomy and development.
Findings
204 genes were found to alter vertebral anatomy, categorized into six distinct phenotype groups.
182 of these genes are expressed in somites, with 60% showing variable expression during maturation.
Fourteen genes have vertebral phenotypes as their only phenotype, and 34 genes have vertebral phenotypes as ≥50% of their total phenotypes.
Abstract
We analyzed the International Mouse Phenotyping Consortium (IMPC) release 19 set of 8,539 phenotyped whole-gene knockouts to identify 204 genes that alter vertebral anatomy and development. These genes are broadly grouped into six categories based on their phenotype: “vertebral number” (22 genes); “vertebral processes” (35 genes); “spine shape” (16 genes); “tail morphology” (73 genes); “vertebral form” (62 genes); and “somitogenesis” (24 genes), with minimal overlap between groups. Gene expression analysis of somite trios across six developmental stages show that 182 of these genes are expressed in somites, and 60% of them show variable expression during somite maturation. A further 54% show expression changes between developmental stages. Fourteen of the 204 genes affecting vertebral anatomy have a vertebral phenotype as their only phenotype, and for 34 genes vertebral phenotypes…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDevelopmental Biology and Gene Regulation · Spine and Intervertebral Disc Pathology · Craniofacial Disorders and Treatments
Introduction
Our vertebral column defines us as vertebrates, and performs dual roles in supporting the body and providing a conduit for the nervous system. Vertebrae are not uniform, and in mammals are broadly classified into five major groups: cervical, thoracic, lumbar, sacral, and caudal. These vertebrae form from bilaterally paired blocks of paraxial mesoderm called somites (Fig. 1a), which bud off from the anterior of the presomitic mesoderm starting from around 3 weeks of development in humans (Carnegie stage 9), and after around 8 days post coitus in the mouse (Theiler stage 12), at a rate of one roughly every 4–6 h in humans, and every 2 h in mice (Tam 1981; William et al. 2007). Somites consist of rostral and caudal portions, and each vertebra is formed from the caudal portion of one somite and the rostral portion of the following somite through a program of resegmentation (Fig. 1b) (Remak 1851; Aoyama and Asamoto 2000; Ward et al. 2017; Criswell and Gillis 2020). Somites are further compartmentalised into the sclerotome, which gives rise to vertebral and rib cartilage and the associated tendons and ligaments, and the dermomyotome, which will give rise to the muscles and dermis of the back (Christ et al. 2004; Scaal 2016; Draga and Scaal 2024). Cells within the sclerotome behave differently depending on location, and contribute to different parts of the vertebra: those in the central sclerotome differentiate in place (i.e. without migration towards the neural tube or notochord) and form the transverse process, proximal parts of the rib, and the pedicle of the vertebral arch; cells in the ventral sclerotome migrate towards the notochord and give rise to the vertebral body; dorsal sclerotome cells migrate towards the dorsal side of the neural tube and form the vertebral arches and associated spinous processes; and lateral sclerotome cells form the distal parts of the ribs (Fig. 1c) (Draga and Scaal 2024). In the mouse, vertebral ossification begins around embryonic day 14 (e14.5). The arches begin to ossify before the centra, with the process starting from two ossification centres, one in the cervical region and one in the lower thoracic region (Hautier et al. 2014), and progressing in cranial and caudal directions from these. Developing vertebrae possess primary ossification centres in both the arches and the centrum, and so a vertebra has three ossification centres (Kaplan et al. 2005).
Fig. 1A Somites are blocks of paraxial mesoderm that are formed in a clock-like fashion. B There are seven cervical vertebrae in the mouse (formed from somites 5 to 12); 13 thoracic vertebrae (somites 12–25); 6 lumbar (somites 25–31); 4 sacral (somites 31–35) and around 30 caudal/tail vertebrae (somites 35 onwards). Vertebral boundaries are in the middle of somites, as each vertebra is formed from the rostral portion of one somite and the caudal portion of the preceding somite through a program of resegmentation. Anterior is to the top, and posterior towards the bottom of the figure. C Within somites, the dermomyotome gives rise to the back muscles and dermis, and the sclerotome, comprising dorsal, ventral, central and lateral portions, gives rise to the spinous processes; vertebral bodies; transverse processes, proximal ribs and pedicles; and distal ribs
Each vertebra therefore requires the proper alignment and interaction of two opposing somites. Given the complexity of the processes of somitogenesis, resegmentation, sclerotome differentiation, and vertebral ossification, it is not surprising that things can sometimes go wrong, resulting in major health impacts. Malformations of the spine can be classified into three major groups: (i) neural tube defects, where the neural tube fails to close; (ii) defects or failures of formation, where a structural element of one or more vertebrae is missing (resulting in hemivertebrae), or where chondrification or ossification centres do not form properly (resulting in wedge vertebrae); and (iii) segmentation defects, where two or more adjacent somites fail to separate or resegment, resulting in block or bar vertebrae. (Kaplan et al. 2005; Trenga et al. 2016). Both formation and segmentation defects can lead to issues with spinal curvature such as congenital scoliosis (Pahys and Guille 2018), and congenital vertebral malformations are thought to occur at a rate of around 0.13–0.5 per 1,000 births or more (Brand 2008; Giampietro et al. 2009; Szoszkiewicz et al. 2024), with hemivertebrae at 1–10 per 10,000 live births (Szoszkiewicz et al. 2024), and congenital scoliosis at 0.5–1.5 per 1,000 births (Sebaaly et al. 2022). In addition to these structural problems, there can also be deviations from the “typical” number of vertebrae within a species through meristic changes or homeotic transformation from one vertebral type into another. Where this transformation is not complete, transitional vertebrae bearing characteristics of two adjacent vertebral types (e.g. thoracic/lumbar, lumbar/sacral) are formed. Numerical variation has been suggested to occur in between 7.7 and 20% of human patients (Hahn et al. 1992; Akbar et al. 2010; Tins and Balain 2016), and transitional vertebrae may occur at anywhere between 1 and 30% of patients (McCulloch and Waddell 1980; O’Driscoll et al. 1996; Chang and Nakagawa 2004; Delport et al. 2006; Akbar et al. 2010; Tureli et al. 2014).
The mouse has long been a model for understanding vertebral development. It has been almost one hundred years since Nadine Dobrovolskaya-Zavadskaya described the short-tail phenotype in mice heterozygous for a mutation in the Brachyury gene (Korzh and Grunwald 2001; Korzh 2023), and over 70 years since a series of research projects quantified the extent of numerical variation in the mouse spine (Grüneberg 1950, 1954; Green 1951, 1953; Green and Russell 1951; McLaren and Michie 1954, 1955, 1956) and demonstrated that the uterine environment could alter vertebral patterning (McLaren and Michie 1958a, b). Mouse mutants provided support for the role of Hox genes in vertebral patterning (Krumlauf 1994; Burke et al. 1995; Wellik and Capecchi 2003; Carapuço et al. 2005; Wellik 2007), and mouse experiments highlighted the potential for interruption of this process by in utero exposure to retinoic acid (Kessel and Gruss 1991). Despite this long history of research, there is still a great deal we can learn about mammalian vertebral patterning from the mouse.
The International Mouse Phenotyping Consortium (IMPC, https://www.mousephenotype.org/) is a multinational effort to produce and phenotype whole gene knock out mouse lines, with the eventual aim of generating a null mutation for every mouse gene, and with an initial focus on genes for which there is not already a reported knock-out (Brown and Moore 2012; Brown et al. 2018; Groza et al. 2023). The mice generated to date have been used to find traits showing sexual dimorphism (Karp et al. 2017), and to identify essential genes (Dickinson et al. 2016) or those associated with: hearing loss (Bowl et al. 2017); metabolic diseases (Rozman et al. 2018); bone mineral density (Swan et al. 2020); and eye development (Moore et al. 2018; Chee et al. 2023). Here we use the May 2023 IMPC release (release 19) to identify genes that result in altered vertebral development.
Methods
Release 19.0 data overview
The IMPC release 19.0 data were downloaded from the IMPC ftp site (https://ftp.ebi.ac.uk/pub/databases/impc/all-data-releases/release-19.0) to provide overall context for focussed vertebral analyses.
Identification and characterisation of candidate genes
We used the IMPC programmatic data access portal application programming interface (API, https://www.mousephenotype.org/help/programmatic-data-access/) to download all records resulting from the digital X-ray imaging (XRY); gross embryo morphology at E9.5 (GEL); E12.5 (GEM); E14.5-E15.5 (GEO); E18.5 (GEP); combined SHIRPA ((SmithKline, Harwell, Imperial College, Royal Hospital, Phenotype Assessment) and dysmorphology (CSD); and body composition (DXA) IMPC release 19 phenotyping pipelines. From these data we identified 76 parameters (defined by IMPC as “a measurement obtained during an experiment”) related to the skeleton, and then parsed these to only those that contained ≥ 1 gene and that pertained to the vertebral column (i.e. removed those that pertained to teeth, the skull, limbs, or the ribcage), resulting in a set of 204 genes marked as ‘significant = TRUE’ by the IMPC framework that alter skeletal development in heterozygous, hemizygous, or homozygous knockouts.
A gene was called significant in males (or females) if any row for that gene had ‘male_ko_effect_p_value’ < 1 × 10⁻⁴ or ‘female_ko_effect_p_value’ < 1 × 10⁻⁴. Categories reported were: “male-only”, “female-only”, “both sexes”, “not considered” (studies run without sex strata), and “considered but not significant” (sex-stratified rows present, but neither sex met the p-threshold). Classification as “sexually dimorphic” required a sex x genotype interaction p-value < 1 × 10⁻⁴ in any row for the gene. None met this criterion. We screened for a ‘genotype main-effect’ p-value < 1 × 10⁻⁴ and ‘no significant sex x genotype interaction’ (interaction p ≥ 1 × 10⁻⁴). None met this stricter “main-effect only” definition in this filtered set.
Zygosity data was taken from the zygosity data field and tallied across significant rows. Life stage was classified as “embryo” or “adult” based on the procedure (embryo gross morphology at E9.5/E12.5/E14.5–15.5.5/E18.5 vs. adult X-ray/SHIRPA/dysmorphology).
We estimated phenotype penetrance using per-animal, categorical observations from the IMPC programmatic API core for release 23 (April 2025), as these analyses were conducted after our initial study of release 19. For each query we retrieved only mutant animals (biological_sample_group = experimental) with categorical outcomes (observation_type = categorical) explicitly labelled abnormal or normal at the individual-animal level. Records labelled “imageOnly” or other non-call categories were excluded from penetrance tallies. Analyses were conducted at the finest relevant granularity: gene × parameter × life stage × sex × zygosity. Life stage was harmonized to embryo versus adult using life_stage_name and (where needed) metadata strings containing “embryo.” Zygosity labels were normalized to {homozygote, heterozygote, hemizygote}. For sex-pooled estimates, male and female mutants were aggregated within the same gene × parameter × life stage × zygosity stratum. We counted mutant animals as “abnormal” and “normal”, and calculated penetrance as number of abnormal animals divided by the total. We report exact 95% binomial (Clopper–Pearson) confidence intervals for as pen_low95 and pen_high95. Unless otherwise stated, we required a minimum of N ≥ 3 mutant observations for summary statistics.
Genomic clustering
We next tested whether genes with significant vertebral phenotypes are physically clustered in the mouse genome more than would be expected by chance. Genomic locations of all mouse genes and the vertebral subset (n = 204) were based on GRCm39 from Ensembl BioMart (https://www.ensembl.org/biomart/martview), and we retained standard chromosomes (1–19, X, Y), removed entries with missing coordinates, and de-duplicated on gene symbol by keeping the record with the smallest start coordinate. For each gene, we asked whether it has at least one other target gene on the same chromosome within a fixed genomic window w (tested windows: 1 Mb, 2 Mb, 5 Mb, 10 Mb). The observed statistic for each window is the number of target genes that have ≥ 1 target neighbour within w. To estimate the null distribution, we sampled 5,000 random gene sets of size 204 without replacement from all annotated genes. For each random set and window w, we recomputed the same “genes with a neighbour” count, outputting the permutation mean and SD, a z-score (obs − µ)/σ, and a Monte-Carlo right-tail p-value as the fraction of permutations with counts ≥ observed (with + 1 smoothing).
Functional analysis
We first performed a rough clustering of Gene Ontology terms associated with these genes using ReviGO (Supek et al. 2011), based on gene annotations from the Database for Annotation, Visualization and Integrated Discovery (DAVID 2021) functional annotation tool (v2023q4, (Sherman et al. 2022), in the GOTERM_BP_FAT category and restricted to terms with modified Fisher exact p-values < 0.05. We next analysed gene function enrichment using the PANTHER (Protein ANnotation THrough Evolutionary Relationship) classification system (Version 18.0 released 2023-08-01 (Mi et al. 2013), for the biological process (BP), molecular function (MF), and cellular component (CC) categories, with comparison to both the full mouse gene set (21,983 genes in PANTHER v18.0) and the smaller IMPC v19 gene set (8,539 genes), focussing only on results with a false discovery rate (FDR) of p < 0.05. Protein-protein interactions between candidate genes were predicted using the STRING (Search Tool for Recurring Instances of Neighbouring Genes) database(Szklarczyk et al. 2019), with MCL clustering (Brohée and van Helden 2006), and restriction to high (0.70–0.89) or highest (0.90–1.0) confidence interactions.
Somite data methods
To assess whether the 204 genes might be involved in the development of the skeletal system, we checked their expression levels in the dataset from Ibarra-Soria et al. 2023 (batch corrected gene expression levels were downloaded from ArrayExpress E-MTAB-12511). We also cross-referenced against the set of genes reported as significantly differentially expressed between somites I, II and III, or between stages (downloaded from https://github.com/xibarrasoria/somitogenesis2022).
Gene analysis
For genes where a vertebral phenotype was their only significant phenotype, and those where vertebral phenotypes represented ≥ 50% of all phenoptyes, we looked for previous associations with vertebral development through searches of PubMed (www.pubmed.gov) using the approved mouse gene symbol by itself, and the gene symbol with the Boolean operator “AND” and the search terms ‘vertebra*’, ‘somite’, or ‘skeletal’. We looked for previous links to scoliosis and spine shape in the same way, using the search terms “gene symbol AND kyphosis”; “gene symbol AND lordosis”; and “gene symbol AND scoliosis”. We also searched Google Scholar (www.google.com/scholar), using the gene symbol (in quotation marks to ensure exact match) and the above search terms, but this tended to result in large numbers of results due to spurious matches.
Results
We analysed the May 2023 IMPC release (release 19), which comprises 8,539 phenotyped genes and nearly 60,000 phenotype hits. Given these numbers, some degree of pleiotropy is to be expected (the term pleiotropy is utilised here as “the effect of a gene or a variant on multiple traits” (Muñoz-Fuentes et al. 2022), and indeed, the average number of phenotypes per gene is 4.43 (median 3), ranging from 0 to 77, with 1,460 genes having no phenotype assigned, and 1,282 genes a single phenotype (Fig. 2).
Fig. 2. The majority of genes phenotyped by the IMPC are associated with more than one phenotype. Red line = mean (4.43). Median is 3
Not every gene has been tested for every phenotype, and some anatomical systems are better represented than others. Of the 7,804 genes that have been tested for a skeletal phenotype, 945 showed a significant effect (Fig. 3a). From these, 864 showed a skeletal phenotype when homozygous, 316 when heterozygous, and 22 hemizygous (i.e. are located on the X chromosome) (Fig. 3b). This total is higher than 945 because some genes have an effect when both homo- and heterozygous.
Fig. 3A Number of genes tested and shown to be associated with higher level phenotypes, and B zygosity of associated mutations. Nearly 8,000 genes have been assessed for a skeletal phenotype, and 945 show a significant effect. The majority of these genes only show a significant association in homozygous mutants
The IMPC pipeline involves digital X-ray imaging of immobilised mice at 14 weeks of age, with results based on visual analysis of a minimum of four males and four females for 53 different IMPReSS (International Mouse Phenotyping Resource of Standardised Screens) parameters (Groza et al. 2023). Also in week 14, body composition of a minimum of seven males and seven females is assessed using a DEXA (Dual Energy X-ray Absorptiometry) analyser, and body length is measured. At 9 weeks of age, mice are assessed for obvious physical, behavioural, and morphological abnormalities under the CSD (Combined SHIRPA (SmithKline, Harwell, Imperial College, Royal Hospital, Phenotype Assessment (Rogers et al. 1997) and Dysmorphology) assessments, based on a minimum of seven males and seven females. Where mutations result in lethal or subviable strains, embryonic phenotypes are assessed based on visible morphological defects at E9.5; E12.5; E14.5–15.5.5; and/or E18.5 (gross embryo morphology). From these pipelines we identified 76 parameters of relevance to skeletal development (Fig. 4), with each parameter having an average of 14.3 associated genes (median 8).
Fig. 4. Number of genes associated with 76 skeletal parameters based on X-ray imaging of immobilised mice at 14 weeks of age (Xray); phenotype assessment and dysmorphology (CSD); gross embryo morphology at E9.5, e12.5, e14.5–15.5.5 and e18.5 (GEM); and body composition assessed using a DEXA (Dual Energy X-ray Absorptiometry) analyser (Body comp). Parameters marked with an x represent the 25 parameters selected for further study
Thirteen parameters (Caudal vertebrae morphology; Cervical vertebrae morphology; Fusion of ribs; Lumbar vertebrae morphology; Missing cranial rib; Number of cervical vertebrae; Number of thoracic vertebrae; Number of ribs (right); Number of ribs (left); Pelvic vertebrae morphology; Rib morphology; Scoliosis; Thoracic vertebrae morphology) had no genes associated with them and were removed from further analysis. We also removed two parameters associated with the ribcage (Shape of ribcage; Shape of ribs); eight associated with development of teeth or the skull (Branchial arch morphology; Craniofacial morphology; Mandibles; Maxilla/Pre-maxilla; Skull shape; Teeth; Teeth presence; Zygomatic bone); and twenty-six parameters associated with the appendicular skeleton (Brachydactyly; Clavicle; Digit integrity; Femur; Fibula; Hindlimbs – size; Hindpaw – shape; Humerus; Joints; Limb Bud Morphology; Limb morphology; Limb Plate Morphology; Number of digits; Pelvis; Polysyndactylism; Radius; Scapulae; Syndactylism; Syndactyly; Tibia; Tibia length (long); Tibia length (short); Ulna). The Body length parameter was also removed as this is perhaps more equivalent to a human “height” phenotype and is likely influenced by other factors beyond vertebral changes, as evidenced by the over 200 genes associated with this parameter (Fig. 4). Full details of excluded parameters, including the relevant IMPC codes can be found in the Supplemental Information.
The final dataset therefore comprised 25 parameters (marked with x in Fig. 4), and a non-redundant list of 204 genes (Supplemental File 1). This number represents 2.4% of genes tested in the IMPC v19 release, and 0.93% of all mouse genes, based on a total number of 21,955 genes in the GRCm39 mouse genome assembly. The number of genes associated with each of the 25 parameters varies from 1 to 30, with an average number of 10.92 genes per parameter (median 11). Partitioning by life stage showed 63 genes with embryo-only effects, 139 genes with adult-only effects, and 2 genes with effects in both embryos and adults (Eng, Nog) (Supplemental file 1). Across all hits, most were observed in homozygotes (233 genes), with fewer in heterozygotes (45) and only two (Nono and Rlim) in hemizygotes, due to their location on the X chromosome (Fig. 5, Supplemental Table 1). The IMPC reports mutant phenotypes by sex, and it is therefore possible to ask both whether male and female effect sizes differ from each other, and whether each differs from the control. We did not detect any sex-dimorphic effects under our prespecified criterion for male vs. female effects (a significant sex x genotype interaction in the IMPC mixed-model at α = 1 × 10⁻⁴), but when we instead summarize per-sex knockout effects against controls (i.e. separate male and female tests at the same α), we observe differences: 53 genes are significant only in males, 34 only in females, and 41 are were significant in both sexes. In addition, 71 genes did not consider sex in the analysis, and 5 were analysed by sex but did not reach the per-sex threshold (Supplemental file 1). However, it should be noted that a line can be significant in one sex and not the other without showing a significant sex x genotype interaction if (i) effect sizes are similar across sexes but one sex has lower power (smaller N, higher variance), (ii) both sexes show effects in the same direction with overlapping confidence intervals, or (iii) multiple-testing adjustments and testing centre-level heterogeneity push one sex just above or below the significance boundary. In other words, this “male-only” or “female-only” significance at fixed α may reflect differential power, not necessarily different biology.
Penetrance was estimated from per-animal categorical calls (abnormal vs. normal) for relevant assays after aggregating mutant counts by gene × parameter × life stage × zygosity. Using the sex-pooled strata, we obtained 290 entries spanning 132 unique genes across 12 candidate parameters (Supplemental file 1). The median pooled penetrance was 40.0% (IQR 0.2%). High-penetrance entries were frequent: 139 (≥ 50%) and 83 (≥ 80%). Parameters with the highest median penetrance included: ‘Sacral processes’ (IMPC_XRY_066_001, median = 100.0%); ‘Cervical processes’ (IMPC_XRY_063_001, median = 85.7%); ‘Tail – morphology’ (IMPC_GEM_029_001, median = 70.0%; IMPC_GEO_033_001, median = 68.3%); and ‘Lumbar processes’ (IMPC_XRY_065_001, median = 52.1%).
To test whether the 204 genes with significant vertebral phenotypes are spatially clustered in the mouse genome beyond what would be expected by chance, we asked (at several genomic scales) how many of these genes have at least one other vertebral candidate gene nearby on the same chromosome. For a given window size w (1 Mb, 2 Mb, 5 Mb, or 10 Mb), we assessed the number of target genes that have at least one target neighbour within w. We then compared the observed counts to a null distribution generated from 5,000 random gene sets (n = 204, sampled without replacement from the mm39 gene set) and recomputed under the same metric. At the 1 Mb scale, 52 of the 204 genes had a neighbour within 1 Mb on the same chromosome, compared with a permutation mean of 36.70 (SD 7.41), yielding a z-score of 2.07 and a Monte-Carlo right-tail p-value of 0.026. At broader scales, this enrichment attenuated: for 2 Mb, 73 observed vs. 62.72 ± 8.54 (z = 1.20, p = 0.128); for 5 Mb, 122 observed vs. 113.17 ± 7.90 (z = 1.12, p = 0.147); and for 10 Mb, 162 observed vs. 156.04 ± 6.03 (z = 0.99, p = 0.185). Thus, there is nominal evidence for short-range clustering at 1 Mb, but this signal does not survive a simple correction for testing four window sizes (e.g., Bonferroni α = 0.0125). Overall, the pattern suggests a modest tendency toward local clustering at very short distances that is weak and not robust to multiple-testing correction, and so these data do not support strong large-scale co-localisation of our vertebral genes across the genome that might imply shared regulatory pathways; at best, there is a mild excess of very close pairs consistent with short-range clustering.
Fig. 5. Chromosomal distribution of the 204 vertebral patterning genes from IMPC v19. These genes show no evidence of clustering that might suggest regulatory or functional associations. There are no genes on the Y chromosome
The 25 parameters can be assigned to six groups based on phenotype: “vertebral number” (22 genes) contains all those that increase or decrease the number of vertebrae; “vertebral processes” (35 genes) contains those that alter the bony projections from the vertebrae, such as the spinous, transverse, and articular processes; “tail morphology” (73 genes) is all genes affecting aspects of tail length, thickness, or overall morphology, including the embryonic tail bud; “vertebral form” (62 genes) comprises genes that alter vertebral shape, including vertebral fusions and transitional vertebrae; “somitogenesis” (24 genes) includes genes that alter somite formation; and “spine shape” (16 genes) includes all genes that alter the overall curve of the spine (Table 1, Supplemental Table 2). Whilst somewhat subjective, there is some rationale to these divisions: for example, vertebral number changes may be the result of changes to somite number, but are more likely the result of homeotic transformation, where vertebrae are transformed into the adjacent type (cervical to thoracic, thoracic to lumbar, lumbar to sacral etc.), and so genes affecting this process can be grouped on the basis of their involvement in this phenomenon. Vertebral processes are all derived from the sclerotome compartment of the somite, and this category reflects this shared origin. Whilst “spine shape” may be influenced by the presence of asymmetric or transitional vertebrae, which would be covered by the “vertebral form” group, this is a whole-spine classification, and so the two can be considered separately.
Table 1. Number of genes and unique genes per IMPC parameter and general vertebral categoryCategoryParameter nameParameter IDNumber of genes in parameterNumber of unique genes in category SomitogenesisSomite MorphologyIMPC_GEL_064_0012424 Spine shapeKyphosisIMPC_XRY_057_001716LordosisIMPC_XRY_058_0011Shape of spineIMPC_XRY_055_00114 Tail morphologyCurl or short tailIMPC_GEP_078_001873Tail - lengthIMPC_CSD_002_00112Tail – morphology (9wks)IMPC_CSD_004_00114Tail - thicknessIMPC_CSD_003_0016Tail bud morphologyIMPC_GEL_033_00111Tail Morphology (E12.5)IMPC_GEM_029_0019Tail Morphology (E14.5–15.5.5)IMPC_GEO_033_00114Tail Morphology (E18.5)IMPC_GEP_038_00118 Vertebral formFusion of vertebraeIMPC_XRY_019_0013062Shape of vertebraeIMPC_XRY_018_00119Transitional vertebraeIMPC_XRY_060_00115 Vertebral numberNumber of caudal vertebraeIMPC_XRY_017_0011722Number of lumbar vertebraeIMPC_XRY_015_0015Number of pelvic vertebraeIMPC_XRY_016_0014 Vertebral processesCaudal processesIMPC_XRY_067_001235Cervical processesIMPC_XRY_063_00111Fusion of processesIMPC_XRY_061_00111Lumbar processesIMPC_XRY_065_0012Processes on vertebraeIMPC_XRY_020_0015Sacral processesIMPC_XRY_066_0012Thoracic processesIMPC_XRY_064_00112
There is minimal overlap between groups (Fig. 6), with the largest overlaps between “vertebral processes” and “vertebral form” (7 genes shared), and “somitogenesis” and “tail morphology” (5 genes shared). The “vertebral number” group contains the most unique genes (20/22 or 91%).
Fig. 6. UpSet plot (Lex et al. 2014) of interactions between the six groups of genes with vertebral phenotypes (spine shape; vertebral number; somitogenesis; vertebral processes; vertebral form; and tail morphology). The horizontal bars on the left show the size of each dataset, and the vertical bars show the size of the interactions between the groups. There is minimal overlap between groups
The majority of these groups show no significant enrichment for particular Gene Ontology ‘biological process’ terms, using the release 19 non-random selection of 8,539 phenotyped genes as a reference, with three notable exceptions: for the ‘Somitogenesis’ group, there is enrichment for genes involved in tissue morphogenesis and neural tube development; the ‘Vertebral form’ group shows enrichment for limb morphogenesis and appendage development; and the ‘Tail morphology’ group shows enrichment for morphogenesis; skeletal system development; epidermis development; neural tube development; and pattern specification. Additionally, the ‘Vertebral processes’ group is enriched for genes involved in enzyme binding, specifically histone deacetylase binding (Supplemental File 2). When repeated with an all mouse gene reference dataset, the enrichment analysis found broadly similar results (data not shown).
Simple clustering of GO terms using ReviGO identified five clusters, which can be roughly classified as ‘development’; ‘metabolism’; ‘transport’; ‘regulation’; and ‘molecular organisation’ (Supplemental Figure S1, Supplemental File 3). Analysis of protein-protein interactions using STRING identified 21 clusters (Supplemental Figure S2) with high confidence (0.70–0.89), comprising 2–4 genes, and 12 clusters at highest confidence (0.90–1.0), comprising 2–3 genes (Supplemental Figure S3). Our dataset therefore comprises genes with a diverse set of functions, and with limited interactions between them.
Somite expression
We next sought to determine which of our 204 candidate genes were expressed during somite formation and maturation, using RNA-Seq data from dissected somite trios at six developmental stages (8, 18, 21, 25, 27, and 35 somite stages) from C57BL/6J mice (Ibarra-Soria et al. 2023). Twenty-two genes were not expressed in somites at these stages, and so may alter vertebral patterning later in development, or via changes to ossification (Supplemental File 4). The remaining 182 genes were skewed to higher expression levels compared to all genes, with 74.7% expressed above the median genome-wide expression level (Fig. 7A).
We next explored whether these genes show differential expression (DE) across development and found nearly 60% (106/182) have a statistically significant change across somite maturation (25 genes) and/or development (98 genes). The genes with expression differences between somites I, II, and III, which might be involved in early stages of somite maturation, included equal numbers of up- and down-regulated genes, most with subtle changes (Fig. 7B).
Fig. 7. Expression of our IMPC vertebral patterning genes during somite development. 182 of 204 genes are expressed in somites, and the majority of these show elevated expression: 75% fall within the peak of highly-expressed genes based on the ‘all genes’ distribution (vertical line indicates the first quartile for vertebral patterning genes mean expression) (A). 25 genes are significantly differentially expressed between somites I, II, and III, and clustering based on expression dynamics identified three patterns of expression (B, C). Roughly half of the genes (n = 13) are expressed highest in the most recently segmented somite (SI) and decrease as somites mature. The other half (n = 12) show the opposite behaviour
In contrast, genes that were significantly DE across developmental progression show clear expression changes between stages. This set of 98 genes can be clustered into 8 groups capturing the main patterns of expression (Fig. 8). Genes in clusters 6–8, which are upregulated in somites from the lumbar and sacral skeleton, often result in tail morphology phenotypes (5/9, 4/10 and 7/11 for clusters 6–8 respectively), while vertebral form and vertebral number phenotypes involve genes from all clusters.
An important limitation here is the substrain mismatch between datasets: our RNA-seq results are from C57BL/6J mice, and IMPC phenotype results are from C57BL/6 N-derived backgrounds. Known substrain variants could modulate baseline expression or trait penetrance in a subset of assays. While our analyses emphasize categorical outcomes and concordant signals across centres (features less sensitive to modest substrain effects) fine-grained quantitative comparisons should be interpreted with caution. Future work using matched substrains will refine these estimates.
Fig. 8. Differential expression of IMPC vertebral patterning genes during development. We find 98 genes that are differentially expressed across one or more pairs of stages of development, and cluster these into eight groups based on their expression dynamics
Vertebral pleiotropy
The majority of genes (n = 157) are associated with only a single vertebral parameter and so do not affect vertebral patterning more widely (Fig. 9), although three genes (Dnase1l2, Duoxa2, Fbn2) affect five vertebral phenotypes: Dnase1l2 is associated with ‘Shape of spine’; ‘Tail – length’; ‘Tail – morphology’; ‘Fusion of vertebrae’; and ‘Fusion of processes’; Duoxa2 is associated with ‘Shape of vertebrae’; ‘Caudal processes’; ‘Lumbar processes’; ‘Sacral processes’; and ‘Thoracic processes’; Fbn2 is associated with ‘Tail – morphology’; ‘Fusion of vertebrae’; ‘Caudal processes’; ‘Lumbar processes’; ‘Sacral processes’ (Supplemental File 1). However, knock out of these genes results in a large number of phenotypes overall (Dnase1l2 = 48 phenotypes; Duoxa2 = 77 phenotypes; Fbn2 = 43 phenotypes), far more than the 4.43 average of the entire IMPC dataset (Fig. 2), and so they are disrupting embryonic development across a wide range of anatomical systems. Potentially more interesting are those genes which affect only the developing vertebral column, i.e. those where vertebral parameters represent 100% of their associated phenotypes (n = 14), or those where vertebral phenotypes represent ≥ 50% of all phenotypes (n = 34) (Fig. 9, Table 2). Of this latter group, seven genes have preweaning lethality as their only non-vertebral phenotype, and all but nine are expressed in developing somites (Table 2). Three of these genes (Fggy,* Gldc*,* Tmc6*) are differentially expressed during somite maturation (i.e. between somites I, II, and III), and 14 genes are differentially expressed between at least one pair of developmental stages.
Fig. 9. Fourteen of the 204 genes affecting vertebral development have a vertebral phenotype as their only phenotype, and for 34 genes (shown in black) vertebral phenotypes represent ≥ 50% of their total phenotypes. Seven of these 34 genes have only preweaning lethality as their other phenotype
Table 2. Genes where vertebral parameters are 50% or more of phenotypes (n = 34), including those with only vertebral phenotypes (n = 14), and those where preweaning lethality is the only other phenotype (n = 7), with zygosity stage and status indicatedGene symbolParameter name (zygosity stage, status)Other phenotype(s) (zygosity stage, status)Expressed in somites?DE in somitesDE between stagesVertebral phenotype only 1110059G10Rik Transitional vertebrae (adult, hom)-YesNAcluster2 4932438H23Rik Thoracic processes (adult, hom)-NoNANA Abca4 Fusion of vertebrae (adult, hom)-YesNANA Fggy Fusion of vertebrae (adult, hom)-Yescluster1cluster4 Gm13547 Fusion of vertebrae (adult, hom)-NoNANA H2-Eb1 Shape of spine (adult, hom)-YesNANA Kdm7a Transitional vertebrae (adult, hom)-YesNAcluster5 Kera Cervical processes (adult, hom)-NoNANA Ppp1r42 Fusion of vertebrae (adult, hom)-NoNANA Spopl Thoracic processes (adult, hom)-YesNANA Tmc6 Fusion of vertebrae (adult, hom)-Yescluster2cluster1 Ushbp1 Fusion of vertebrae (adult, hom)-YesNAcluster2 Xndc1 Fusion of processes (adult, hom)-YesNANA Zscan2 Fusion of vertebrae (adult, hom)-YesNANAVertebral phenotype(s) + preweaning lethality only Chd1l Tail – length (adult, het)Preweaning lethality, incomplete penetrance (adult, hom)YesNAcluster5 Gldc Thoracic processes (adult, het)Preweaning lethality, complete penetrance (adult, hom)Yescluster2cluster1 L3mbtl2 Transitional vertebrae (adult, het)Preweaning lethality, complete penetrance (adult, hom)YesNAcluster1 Sel1l Fusion of vertebrae (adult, het)Preweaning lethality, complete penetrance (adult, hom)YesNANA Supt5 Transitional vertebrae (adult, het)Preweaning lethality, complete penetrance (adult, hom)YesNAcluster4 Vps53 Thoracic processes (adult, het)Preweaning lethality, complete penetrance (adult, hom)YesNANA Mir320 Tail – length (adult, het);Tail – thickness (adult, het)Preweaning lethality, complete penetrance (adult, hom)NoNANAVertebral and non-vertebral phenotypes Cbx5 Fusion of processes (adult, hom);Fusion of vertebrae; (adult, hom)Transitional vertebrae (adult, hom)Increased grip strength (adult, hom);Decreased circulating glucose level (adult, hom)YesNANA Cdcp3 Tail – thickness (adult, hom)Abnormal eye morphology (adult, hom)YesNA Dcdc2c Shape of vertebrae (adult, hom)Increased heart weight (adult, hom)NoNANA Hoxc12 Tail – length (adult, hom)Tremors (adult, hom)YesNAcluster2 Il12a Fusion of vertebrae (adult, hom)Increased circulating HDL cholesterol level (adult, hom)NoNANA Mbd1 Lordosis (adult, hom);Transitional vertebrae (adult, hom)Female infertility (adult, hom);Increased mean corpuscular hemoglobin (adult, hom)YesNAcluster5 Pramel17 Tail – morphology (adult, hom)Enlarged heart (adult, hom)YesNANA R3hdm1 Number of lumbar vertebrae (adult, hom);Number of pelvic vertebrae (adult, hom)Small testis (adult, hom);Abnormal testis morphology (adult, hom)YesNANA Ralb Thoracic processes (adult, hom)Increased leukocyte cell number (adult, hom)YesNAcluster3 Scn3b Fusion of vertebrae (adult, hom)Abnormal bone structure (adult, hom)NoNANA Sh2d5 Thoracic processes (adult, hom)Decreased circulating creatinine level (adult, hom)YesNAcluster7 Slc6a5 Number of lumbar vertebrae (adult, het);Number of pelvic vertebrae (adult, het)Increased mean corpuscular hemoglobin concentration (adult, het);Preweaning lethality, complete penetrance (adult, hom)NoNANA Vgll3 Shape of spine (adult, hom)Decreased startle reflex (adult, hom)YesNAcluster6All but nine genes are expressed in developing somites, although they do not always show differential expression (DE) between neighbouring somites or developmental stages
Discussion
The knockout mice generated by the International Mouse Phenotyping Consortium offer great potential to identify novel genes underlying important human health issues. Vertebral malformations are a key part of many human syndromes, including Klippel-Feil syndrome, which affects 1 in 40,000 births, and is typically characterised by fusion of cervical vertebrae, although more caudal regions can be involved (Raas-Rothschild et al. 1988; Clarke et al. 1998; Hachem et al. 2020); the VATER/VACTERL association of vertebral defects, anal atresia, tracheoesophageal fistula with esophageal atresia, and radial or renal dysplasia (Quan and Smith 1973), affecting 1 in 10,000 to 1 in 40,000 infants (Solomon 2011, 2018); spondylocostal dysostoses, affecting 1 in 40,000 births (Berdon et al. 2011; Nóbrega et al. 2021); craniofacial microsomia/Goldenhar syndrome affecting between 1 in 35,000 to 1 in 56,000 births; and many others (Giampietro et al. 2009). Vertebral malformations can also present on their own, where they may be associated with congenital scoliosis, which affects 0.5 to 1 in 1,000 births (Takeda et al. 2018; Sebaaly et al. 2022), and lower back pain (Nardo et al. 2012; Hopkins and Abbott 2015; Jat et al. 2023). Low back pain is the leading cause of disability in most countries, affecting some 619 million people in 2020, with a projection for this to rise to 843 million people by 2050 (Ferreira et al. 2023).
Our analysis of the International Mouse Phenotyping Consortium project data (release 19) has identified 204 genes that alter vertebral patterning when mutated, representing 0.93% of all mouse genes. Of these, the greatest number affect tail morphology (73 genes), suggesting that this part of the murine axial skeleton is perhaps most susceptible to (and tolerant of) change. Overall vertebral form, encompassing vertebral shape, fusion of adjacent vertebrae, and transitional vertebrae, is the next largest category with 62 genes. The majority of the 204 genes are associated with multiple phenotypes, both vertebral and non-vertebral, with a mean number of 12 phenotypes per gene (median 8), and so for many of these genes it is likely that the observed vertebral effects may be secondary outcomes of more fundamental changes to organ systems or wider developmental processes.
Of the 34 genes where vertebral phenotypes represent ≥ 50% of all phenotypes associated with the knockout, five of them (1110059G10Rik, Gm13547, Xndc1, Cdcp3, Pramel17) retrieved zero results in PubMed, and another five (4932438H23Rik, Ppp1r42, Zscan2, Supt5, Dcdc2c) have ≤ 5 results. Such findings are to be expected, as the IMPC specifically focusses on phenotyping genes for which there is little to no functional annotation (the so-called “dark genome” (Lloyd et al. 2020). The remaining genes range between 6 and 1,099 PubMed entries, with an average of 204 (median 96). Searching for the gene symbol with the Boolean operator “AND” and either ‘vertebra*’ or ‘somite’ did not identify any previous publications linking these genes with either somitogenesis or changes to vertebral anatomy. Searching for “gene symbol AND skeletal” resulted in a few relevant results for a subset of genes, and these are detailed further in the discussion section.
In early 2024, Szoszkiewicz et al. published a review of 118 genes they found to be involved in congenital vertebral malformations in humans (Szoszkiewicz et al. 2024). Of these, none overlap with our subset of 34 genes where vertebral phenotypes are ≥ 50% of all phenotypes, and only 10 (AFF4,* CELSR1*,* COL11A1*,* FUZ*,* GDF11*,* SHROOM3*,* SLC26A2*,* SLC29A3*,* SLC35D1*,* VANGL2*) are found in our larger set of 204 genes. Our dataset therefore comprises an extensive number of novel genes that affect vertebral patterning.
We identified 16 genes that alter spine shape, including kyphosis (Kcnv2); lordosis (Mbd1); overall spine shape (which reflects spine curvature, or scoliosis, Ccnd2,* Dnase1l2*,* H2-Eb1*,* Nisch*,* Pabpc4*,* Selenok*,* Uchl1*,* Vgll3*); or both kyphosis and overall spine shape (Ropn1l,* Sik3*,* Slc20a2*,* Tpte*,* Tram2*,* Wdr37*). These genes have between 20 and 2,498 results in PubMed (mean 355, median 106), and searches for “gene symbol AND kyphosis”; “gene symbol AND lordosis”; and “gene symbol AND scoliosis” returned no results for all genes except Uchl1, which has previously been associated with the development of scoliosis as part of a wider range of health issues associated with a novel form of Behr syndrome (McMacken et al. 2020). These 16 spine shape genes are associated with between 1 and 5 vertebral phenotypes, and between 1 and 77 overall phenotypes.
Fendri et al. previously identified 145 differentially-expressed genes (86 upregulated, 59 downregulated) in primary osteoblasts from spinal vertebrae of Adolescent Idiopathic Scoliosis (AIS) patients compared to unaffected controls (Fendri et al. 2013), and the only overlap to our dataset is the paired-like homeodomain transcription factor 1 (Pitx1), which they find to be downregulated, and which we find to result in the formation of transitional vertebrae. We have therefore identified 15 novel genes that alter spine shape.
At least some of the 34 genes where vertebral phenotypes represent ≥ 50% of the assigned phenotypes show intriguing links to relevant developmental processes. For example, the Spopl paralog Spop has been shown to be involved in skeletal development, through regulation of the hedgehog signalling pathway, specifically Indian hedgehog (Ihh), but a Spopl knockout did not produce apparent skeletal defects (Cai and Liu 2016). However, that study was based on the Spopl^tm1(KOMP)Vlcg^ mutant, which deletes a region starting within exon 2 and ending within exon 11 between positions 23,401,253 − 23,435,530 of chromosome 2, whereas the IMPC phenotype is based on Spopl^tm1a(EUCOMM)Wtsi^ mutation (chr2:23,432,934 − 23,433,814) and deletes only a single exon (exon 5). It is possible that the vertebral arch phenotype identified in the IMPC data was too subtle to be seen in those experiments.
The microRNA Mir320 is in the MIPF0000163 mir-320 gene family, along with human MIR-320 A, B, C and D, which have been shown to be expressed in mesenchymal stem cells and which may play a role in the switch between the formation of bone marrow osteoblasts and adipocytes (Hamam et al. 2014). Il-12 mutations decrease osteogenesis (Xu et al. 2020), and Mbd1 is expressed in chrondrocytes (Ząbek et al. 2022). Kdm7a is also involved in bone development, especially adipocyte and osteoblast differentiation (Yang et al. 2019), and has been claimed to alter development of the axial skeleton in mice through alteration of the expression of posterior Hox genes (Higashijima et al. 2020; Li et al. 2023). Cbx5 is involved in osteoblast differentiation, and may alter Hox gene expression through modification of chromatin structure (van Wijnen et al. 2021; Dashti et al. 2024), in a similar way to that of the paralogous Cbx2 (Sato et al. 2020). Vgll3 has a role in osteogenic and myogenic differentiation (Gabay Yehezkely et al. 2020; Yuan et al. 2022) as does the related Vgl-2 in chicken and zebrafish (Mann et al. 2007; Bonnet et al. 2010).
The carbohydrate kinase Fggy is expressed in skeletal muscle, and has been suggested to have a role in muscle cell differentiation (Smith et al. 2021), and therefore the somite expression of this gene (Table 2) suggests a role in the dermomyotome, which gives rise to skeletal muscles. Ralb is also expressed in skeletal muscle (Wildey et al. 1993; Chen et al. 2019). CHD1L has been implicated in skeletal defects in stillborn fetuses, nervous system development, and short stature (Dou et al. 2017; Henrie et al. 2018; Workalemahu et al. 2024), and Gldc is expressed in neuromesodermal progenitors (Gouti et al. 2017; Chang et al. 2022). Finally, Hoxc12 has been implicated in vertebral development, through disruption of the genomic region in deletion of the Hotair long non-coding RNA (lncRNA) (Amândio et al. 2016), although given the role of Hox genes in axial pattering (Krumlauf 1994; Burke et al. 1995; Carapuço et al. 2005; Wellik 2007), such a role is not surprising, and it is likely that the mildness of the phenotype (short tail) can be explained through compensation by the paralogous Hoxd12. We find that these short-tail Hoxc12 knockout mice (Fig. 10) have fewer caudal vertebrae than wild type controls (26.5 vs. 27.8 respectively), based on an analysis of the IMPC X-ray images for 12 mutant and 15 wildtype mice. Variations in the number of caudal vertebrae such as this do however highlight one of the major differences between humans and mice – the tail. Whilst the mouse can be a useful model for conditions affecting the majority of vertebral regions, the caudal region shows the largest difference between the two species, with only three to five fused segments in humans (Öztürk et al. 2023). Findings from the mouse are perhaps to be more useful in evolutionary studies investigating tail length variation, as recently demonstrated for jerboas (Weber et al. 2024).
Fig. 10. Wildtype (specimen C2145, left) and Hoxc12 knockout (specimen J95772, right) mice. The Hoxc12 knockouts have significantly shorter tails, and fewer caudal vertebrae
Our gene ontology and network analyses suggested that the genes we have identified represent members of a diverse range of biological processes in vertebral specification, development, and ossification. Of the total 14,706 annotated mouse protein-coding genes, 7,581 are expressed in somites (51.55%), and therefore our 182/204 somite-expressed genes (89.2%) can be considered an enrichment, as would be expected for genes having a skeletal phenotype. Our dataset therefore comprises genes with a diversity of roles in vertebral development, covering somitogenesis, the specification of vertebral identity, sclerotome migration and differentiation, and later chondrogenesis and ossification. Given the large number of knockout phenotypes that alter vertebral shape and vertebral processes (62 and 35 respectively, or 87 overall when duplicate genes found in both categories are removed), it seems likely that the sclerotome differentiation and subsequent chondrogenesis and ossification are some of the most sensitive parts of vertebral development.
Within our set of 204 genes, nine (Barx2,* Cyb561*,* Fbn2*,* Gal3st1*,* Hip1*,* Lrrk1*,* Micu1*,* Npr2*,* Sirt3*) have a vertebral phenotype and affect gait (MP:0001406 ‘abnormal gait’); seven genes (Cyp27b1,* Duoxa2*,* Fbn2*,* Lrrk1*,* Sik3*,* Slc29a1*,* Tm9sf4*) affect both vertebral development and the ribcage (MP:0000150 ‘Abnormal rib morphology’; MP:0004624 ‘abnormal thoracic cage morphology’); and eight (Cyp27b1,* Fbn2*,* Lrrk1*,* Runx2*,* Sik3*,* Ttc28*,* Twist1*,* Ube2g1*) result in abnormal development of the pelvic girdle (MP:0004508 ‘abnormal pectoral girdle bone morphology’). The IMPC knockout mice that we have identified therefore represent not only a powerful resource to improve our understanding of the processes underlying vertebral development, but also the links between these processes and the development of associated skeletal structures, with relevance for our understanding of a diversity of human diseases.
Conclusions
Mammalian vertebral development is a complex process, and the loss of a single gene is often sufficient to produce extreme phenotypes such as fewer or fused vertebrae. We have identified 204 genes where the knockout phenotype results in changes to vertebral anatomy, including 14 which only produce a vertebral phenotype, and a further 20 where vertebral phenotypes represent ≥ 50% of all recorded phenotypes. The fact that nearly 1% of all mouse genes can alter vertebral development and anatomy reflects the complexity of the underlying processes, and demonstrates that novel cellular and molecular mechanisms could underlie vertebral phenotypes in a range of human health issues.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary Material 1
Supplementary Material 2
Supplementary Material 3
Supplementary Material 4
Supplementary Material 5
Supplementary Material 6
The reference list from the paper itself. Each links out to its DOI / PubMed record.
