IGD: a multi-omics database for Ipomoea pes-caprae genomic and biological research
Jiahao Cai, Xingguang Chen, Xueli Li, Mingyue Zhang, Xinqiang Lin, Yile Hu, Haoran Feng, Xinyu Li, Jinbin Hu, Shuqi Yang, Lulu Wang, Xiaoping Niu, Gang Wang, Boping Tang, Sheng Wang, Yuan Qin, Yan Cheng

TL;DR
This paper introduces IGD, a comprehensive multi-omics database for the salt-tolerant plant Ipomoea pes-caprae to support genomic and biological research.
Contribution
The first integrated multi-omics database for Ipomoea pes-caprae with stress-responsive gene subsets and user-friendly tools.
Findings
IGD provides genomic sequences, annotations, and transcriptomic data under various stress conditions.
The database includes specialized tools for stress-related gene screening and evolutionary analysis.
Automated pipelines ensure data quality and scalability for future omics integration.
Abstract
Ipomoea pes-caprae (IPC) is a perennial halophytic vine with remarkable salt and drought tolerance, playing a critical ecological and medicinal role in tropical and subtropical coastal ecosystems. Despite the availability of a high-quality chromosome-level reference genome and abundant transcriptome data, the absence of an integrated data platform has hindered in-depth functional gene discovery and genomic research in IPC. To address this gap, we developed the IPC Genome Database (IGD), the first comprehensive multi-omics database dedicated to IPC. IGD provides high-quality genomic sequences, gene structure annotations, and functional annotations, along with transcriptomic expression profiles under salt stress across different tissues. It also includes time-course expression data of roots and leaves under salt stress treatment, as well as leaf expression profiles under cold and heat…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —Natural Science Foundation of Fujian Province
- —National Natural Science Foundation of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPlant Gene Expression Analysis · Genomics and Phylogenetic Studies · Plant Molecular Biology Research
Introduction
Halophytes, due to their unique physiological mechanisms and molecular regulatory networks for salt and alkaline tolerance, have become important research subjects for elucidating plant stress adaptation mechanisms and evolutionary strategies [1]. These plants not only provide key gene resources for molecular breeding of stress-resistant crops but also offer theoretical foundations for the ecological management and sustainable utilization of saline-alkali lands. Ipomoea pes-caprae (IPC), a typical representative, is a perennial vine species of the genus Ipomoea in the Convolvulaceae family, widely distributed in tropical and subtropical coastal ecosystems [2]. This species possesses multiple stress tolerance traits, including salt and drought resistance, and plays an irreplaceable ecological role in coastal sand fixation, slope protection, and the restoration of degraded ecosystems. Additionally, its bioactive compounds show significant potential in medicinal applications such as anti-inflammatory and analgesic treatments.
Given the significant scientific and industrial value of IPC, our research team conducted de novo transcriptome sequencing and assembly of IPC leaves under heat and cold stress conditions in 2021 [3]. Subsequently, in 2023, we completed and published a chromosome-level high-quality reference genome of IPC, along with transcriptomic data from different tissues under salt stress treatment [4]. Based on this reference genome, we re-mapped the time-course RNA-seq data under salt stress previously published by Liu et al. to the new genome and integrated the datasets to construct an updated expression matrix [5]. This integrated analysis provided deeper insights into the molecular responses of IPC to salt stress. Currently, most model plants, such as Arabidopsis thaliana, Oryza sativa, and Zea mays [6], have established functional genomics databases that provide standardized data platforms for related research. To address current research limitations in IPC, we developed the first integrated functional genomics database for IPC — IGD (https://ipcgenome.org.cn). Through a web browser, users can conveniently access and query a variety of data types provided by the platform, including genome information, transcriptome data, and functional annotation. This database not only enables efficient management and easy access to multi-omics data, but also integrates a variety of online analysis tools, providing a one-stop data platform for functional genomics research and evolutionary analysis of IPC. During the construction of the database, we implemented automated pipelines to enable regular data updates and quality control, ensuring the accuracy and timeliness of the data. For IPC’s unique traits such as salt tolerance and stress resistance, a dedicated functional gene subset database was established to support rapid candidate gene screening and evolutionary analysis. The creation of this platform not only provides a data infrastructure for basic research on IPC but also lowers the threshold for bioinformatics analysis, facilitating the transition of non-model species research from “data accumulation” to “functional interpretation”. In the future, the database will continuously incorporate more omics data from halophytes, providing core data support for elucidating adaptive evolution mechanisms in salt-tolerant plants and for breeding stress-resistant crops.
Methods
Data sources
The genomic data used for building the IPC genome database were obtained from our previously published genome sequencing and assembly project [4]. The raw sequencing reads (CRA013080) and the final assembled genome (GWHEQBK00000000) have been deposited in the China National Center for Bioinformation (https://www.cncb.ac.cn), and serve as the foundational dataset for the current database construction. In addition, part of the RNA-seq data was derived from our previously published transcriptomic study of IPC leaves under heat and cold stress (Project ID: PRJNA656146) [3], while the other part was obtained from the publicly available time-course salt stress dataset published by Liu et al. (Project ID: PRJNA656933) [5]. We integrated these datasets and analyzed gene expression patterns across different tissues under salt stress [4]. These publicly available datasets serve as fundamental resources for the construction of this database.
Construction of GO and KEGG annotation database
To achieve functional annotation of transcriptome genes, this study employed the eggNOG-mapper tool [7] to construct an annotation database. Protein sequences of the IPC genome were uploaded to the eggNOG-mapper online platform (http://eggnog-mapper.embl.de/) for automated annotation. The resulting annotation data were then used for GO and KEGG enrichment analyses with the R package clusterProfiler [8–11].
Database system architecture
The IGD was developed based on a decoupled front-end and back-end architecture to ensure scalability, modularity, and responsive user experience. The back-end system integrates Spring Boot for handling core biological data operations and Flask [12] for lightweight tool interfacing and data routing. The front-end framework was constructed using Vue.js and Bootstrap [13], with Element UI enhancing dynamic interface components and overall interactivity. The data layer is built upon a MySQL [14] relational database, used to store and manage genomic sequences, gene models, functional annotations, and expression profiles. To further optimize performance, Redis [15] was deployed as a caching layer, significantly accelerating data retrieval and minimizing redundant queries. The entire platform is hosted on an Ubuntu-based Linux server environment, ensuring high stability and extensibility for concurrent user access. The architecture supports RESTful APIs for cross-module communication and enables integration with external bioinformatics tools and resources.
Visualization and interactive interface
To enhance data interpretation and user interaction, the IGD platform employs diverse visualization techniques and intuitive graphical interfaces. Several visualization tools were developed using the R Shiny framework [16], enabling real-time parameter adjustment and rendering of interactive charts. The expression heatmap module supports log-transformation, z-score normalization, and hierarchical clustering. It allows users to explore gene expression dynamics across multiple treatments and time points. The enrichment analysis module, powered by the clusterProfiler R package [8], produces bubble plots and bar graphs that illustrate the statistical significance and functional categories of enriched GO terms and KEGG pathways. The phylogenetic tree module integrates multiple sequence alignment and tree-building tools such as MAFFT [17] and FastTree [18]. It provides tree visualization with customizable branch labels and color-coded clades, enhancing evolutionary analysis. Chromosomal localization of genes is implemented using in-house scripts that generate dynamic SVG plots based on genome coordinates, while genome-wide navigation is supported through JBrowse2 [19], enabling track selection, zooming, and annotation browsing in a user-friendly environment. Each visual component is precomputed for large datasets and rendered dynamically based on user selection to ensure both efficiency and flexibility. Together, these features provide a powerful platform for the interactive exploration of IPC genomic resources.
Integration of third-party tools
To enhance the functional breadth and analytical depth of the platform, the IGD system integrates a variety of mainstream open-source bioinformatics tools, enabling online access through a unified interface. The sequence alignment feature is powered by SequenceServer [20], allowing users to submit nucleotide or protein sequences online for BLAST analysis. The results are presented with both graphical visualization and annotated hyperlinks for further exploration. The genome browser module is built with JBrowse2, supporting dynamic multi-track loading and interactive browsing of gene structures, annotation information, and RNA-seq coverage. The primer design tool integrates PrimerServer2 [21, 22], enabling efficient primer screening based on gene IDs or custom sequences. It outputs detailed parameters including primer sequences, melting temperatures (Tm), and expected product lengths. In addition, the platform provides preprocessed expression matrices, annotation files, and sequence data for user download, facilitating further in-depth analysis with local visualization tools. All integrated tools are independently deployed on the platform server and accessed via a web-based interface to ensure fast response times and stable performance (Fig. 1).Fig. 1. Database architecture design diagram. Data sources, implementation methods, and results of database construction in IGD
Results
IPC database content
To advance research on IPC in the fields of genomics, molecular biology, and responses to abiotic stresses, we developed the IPC Genome Database (IGD), an interactive web-based platform that integrates data retrieval, functional analysis, and visualization. This platform is designed to provide researchers with efficient and convenient access to a wide range of genomic resources and analytical tools. IGD consists of seven core modules: Home, Genomics, Genes, Transcriptomics, Tools, Downloads, and Help. Currently, the database incorporates a high-quality IPC genome assembly along with multiple transcriptome datasets derived from various tissues and treatment conditions, including cold and heat stress responses in leaves, time-series salt stress treatments in roots and stems, and tissue-specific transcriptional responses under salt stress. These resources offer valuable data for investigating the molecular mechanisms of halophytic adaptation to salt and temperature stresses. In addition, the Help module provides comprehensive guidance on the functionalities, application scenarios, and usage instructions of each module and tool, enabling users to efficiently navigate and utilize the database platform (Fig. 2).Fig. 2. The homepage of the Ipomoea pes-caprae genome database
Genomics module
The genome module provides fundamental support for structural and functional genomic studies of IPC. This module consists of three sub-functions: JBrowse, Gene Location Visualization, and Sequence Fetch. The “JBrowse” module of the database uses JBrowse, a genome visualization tool, to display the genome of IPC. Users can select a target chromosome via the search bar at the top or input specific genomic coordinates for browsing. For example, by selecting the 5046276–5214277 bp region on chromosome Chr1, users can view multiple genes within this interval, such as Ipc01G0001500. Upon clicking the gene, the system displays its detailed transcript structure, including the positions of all exons and introns, along with annotation information and expression levels (Fig. 3A). In addition, users can retrieve the DNA sequence of the selected gene. The zoom function enables multi-scale navigation, from macro-level genome structure to individual nucleotide resolution, facilitating in-depth analysis. The Gene Location Visualization function allows for precise mapping of target genes onto reference chromosomes and rapid generation of high-resolution images, which is particularly useful for gene family analysis (Fig. 3B). The Sequence Fetch tool supports retrieval of user-defined genomic regions, enabling applications such as chromatin interaction analysis (e.g., Chip-seq) (Fig. 3C).Fig. 3. Genomics module. A JBrowse2: a modern, interactive genome browser for visualizing and exploring genomic data. B Gene chromosome localization visualization tool. C Sequence extraction tool on chromosomes
Genes module
The Genes module includes two main functions: Gene Search and Homolog Gene Search. In the Gene Search function, users can search for genes by entering a gene ID or partial ID. The system displays detailed information about the gene, including its structure, chromosomal location, gene annotation, functional annotation, and the corresponding gene sequence, CDS sequence, and protein sequence. In addition, users can view the gene expression levels under various conditions, including expression in leaves under cold and heat treatments, expression in roots and leaves under time-course salt stress treatment, and expression across different organs under salt stress [3–5] (Fig. 4A). The homologous gene search function enables rapid identification of homologous genes between IPC and* Arabidopsis thaliana*. Users can search using gene IDs, gene names, or functional descriptions from either IPC or Arabidopsis. For example, enterinrabg “SOS” will display all genes associated with the SOS gene family (Fig. 4B), while inputting “kinase” will retrieve all genes related to kinase functions (Fig. 4C). This feature offers a user-friendly interface with fast response times, facilitating efficient identification of target genes and enhancing the accuracy and efficiency of homologous gene analysis.Fig. 4. Genes module. A Retrieve detailed gene information by querying gene IDs. B Identification of homologous genes between IPC and Arabidopsis thaliana by gene name. C Retrieval of gene sets related to specific functions, such as “kinase”, through keyword-based searches of functional descriptions
Transcriptomics module
The transcriptomics module of the IGD database offers comprehensive and scalable solutions for transcriptome data mining, particularly in the context of stress-responsive gene regulation in IPC. It integrates both gene expression visualization and functional enrichment tools to meet diverse downstream analysis requirements. Specifically, the “Studies Detail” module provides a centralized display of transcriptomic sample information, enabling users to quickly access experimental background. The “Gene Expression Visualization” function supports the dynamic generation of heatmaps based on normalized FPKM values derived from multiple tissues and stress conditions. The system performs hierarchical clustering using Euclidean distance, allowing for bidirectional clustering of both genes and samples, thereby facilitating the identification of expression modules or co-expression clusters with potential regulatory relevance (Fig. 5A). To enhance interpretability, the interactive interface allows users to customize visualization parameters such as color gradients, block shapes, and layout ratios. This functionality is particularly useful for deciphering gene expression patterns in response to abiotic stresses, such as salt and temperature. For functional annotation and enrichment analysis, the “GO and KEGG Enrichment” module (Fig. 5B) integrates the latest releases of the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. Upon submission of a target gene set, the system performs statistical enrichment analysis based on the hypergeometric test and automatically generates dual-format visual outputs: bubble plots displaying the significance (-log10 P-value) and gene set coverage of enriched terms, and bar charts depicting the gene density within key metabolic pathways or molecular functions. All results are exportable in SVG/PDF formats and accompanied by comprehensive statistical tables, including adjusted P-values, enrichment factors, and other relevant metrics, providing multidimensional evidence for gene function and regulatory network interpretation. Through algorithmic optimization, this module enables efficient processing of large-scale transcriptomic data. Its interactive visualization design significantly enhances the readability of complex datasets and facilitates the translational application of transcriptomic research.Fig. 5. Transcriptomics module. A Expression heatmap visualization. B GO enrichment analysis bubble plot and bar chart
Tools integration
Beyond its core analytical modules, the database integrates a comprehensive suite of interoperable tools to support diverse genomic research demands. The BLAST engine supports cross-species homolog identification, allowing users to input protein or nucleotide sequences from any species for homology searches against a dedicated, locally indexed IPC database (E-value threshold ≤ 1e − 5), thereby rapidly identifying candidate orthologs with potential functional relevance (Fig. 6A). The iSect tool supports coordinate-based queries for precise extraction of gene sequences (e.g., 2,000 bp upstream of the 5' end of a gene) (Fig. 6B), outputting sequences in FASTA format for downstream applications such as promoter region analysis. The Phylogenetic Tree Builder, based on the Neighbor-Joining algorithm, allows rapid construction of phylogenetic trees for user-defined sets of IPC genes. The resulting trees are visualized in rectangular or circular layouts (Fig. 6C) and exported in Newick format for subsequent editing and optimization. The Primer Design Tool, powered by a dynamic programming algorithm (Primer3 core), enables customizable primer design by allowing specification of product length and annealing temperature. It performs in situ specificity validation against the IPC reference genome, automatically excluding primers that span exon-exon junctions or bind to non-target regions, thereby maximizing experimental success rates (Fig. 6D).Fig. 6. Tools integration. A SequenceServer: a web-based BLAST search tool for local genomic sequence analysis. B Gene sequence extraction tool. C Phylogenetic tree construction. D PrimerServer: an automated web-based tool for designing PCR primers with customizable constraints
Discussion and perspectives
In this study, we constructed IGD, the first database of IPC based on genomic and transcriptomic data. IGD provides a comprehensive and high-quality multi-omics dataset for IPC, including genomic information, gene sequences with functional annotations, and spatiotemporal gene expression dynamics. Based on these rich data resources, we have developed and integrated multiple powerful and interactive bioinformatics analysis and visualization tools. This platform will facilitate researchers in deeply elucidating the regulatory networks underlying the growth, development, and stress resistance of IPC, and in identifying key functional genes, thereby advancing systematic understanding and applied research on salt tolerance mechanisms in halophytes.
In summary, the data and tools provided by IGD fully demonstrate its considerable potential in fundamental scientific research and genetic improvement and breeding. In the future, we plan to regularly update and integrate more types of omics data for IPC, continuously enriching and enhancing the resources and functionalities of IGD. We will also continuously update the data resources of IPC and other halophytes, such as Suaeda glauca [23]. This will further expand the scientific value and service capacity of the database, supporting saline-alkali land remediation and bioresource utilization. Moreover, we will conduct in-depth investigations into gene family characteristics and transcriptomic regulatory mechanisms of IPC, promoting the discovery and validation of related functional genes, thereby providing a solid foundation for genetic improvement and functional gene research in halophytes. We firmly believe that IGD will become a critical data platform in the field of halophyte research, driving sustained progress and innovative breakthroughs in both scientific research and industrial applications.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bose J, Rodrigo-Moreno A, Shabala S. ROS homeostasis in halophytes in the context of salinity stress tolerance. J Exp Bot. 2014;65:1241–57. 10.1093/jxb/ert 43010.1093/jxb/ert 43024368505 · doi ↗ · pubmed ↗
- 2Cheng Y, Wang Y, Sun J, Liao Z, Ye K, Hu B, et al. Unveiling the genomic blueprint of salt stress: insights from Ipomoea pes-caprae L. Seed Biol. 2023;2. 10.48130/Seed Bio-2023-0021.
- 3Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, et al. cluster Profiler 4.0: a universal enrichment tool for interpreting omics data. Innovation. 2021;2. 10.1016/j.xinn.2021.100141.10.1016/j.xinn.2021.100141 PMC 845466334557778 · doi ↗ · pubmed ↗
- 4Yu G. Thirteen years of cluster Profiler. Innovation. 2024;5. 10.1016/j.xinn.2024.100722.10.1016/j.xinn.2024.100722 PMC 1155148739529960 · doi ↗ · pubmed ↗
