Deciphering the Plastome and Molecular Identities of Six Medicinal “Doukou” Species
Ying Zhao, Amos Kipkoech, Zhi-Peng Li, Ling Xu, Jun-Bo Yang

TL;DR
This study identifies six medicinal plant species using genetic markers to improve plant identification and conservation.
Contribution
The study introduces the most effective genetic markers for distinguishing six morphologically similar medicinal plant species.
Findings
ITS, ITS1, and complete plastome sequences best identified three out of six 'Doukou' species.
ycf1 sequences identified two species, while ITS2, matK, and psbA-trnH identified only one each.
rbcL failed to identify any species among the six 'Doukou' plants.
Abstract
The genus Amomum includes over 111 species, 6 of which are widely utilized as medicinal plants and have already undergone taxonomic revision. Due to their morphological similarities, the presence of counterfeit and substandard products remains a challenge. Accurate plant identification is, therefore, essential to address these issues. This study utilized 11 newly sequenced samples and extensive NCBI data to perform molecular identification of the six medicinal “Doukou” species. The plastomes of these species exhibited a typical quadripartite structure with a conserved gene content. However, independent variation shifts of the SC/IR boundaries existed between and within species. The comprehensive set of genetic sequences, including ITS, ITS1, ITS2, complete plastomes, matK, rbcL, psbA-trnH, and ycf1, showed varying discrimination of the six “Doukou” species based on both distance and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8- —Obtaining Super Barcodes of Important Wild Plants in Gaoligong Mountain
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural product bioactivities and synthesis · Synthesis of Organic Compounds · Phytochemical Studies and Bioactivities
1. Introduction
Species identification is crucial in the fields of biology and ecology [1] and serves as the basis for ecological research, enabling the understanding of species richness and biodiversity [2]. It informs conservation efforts through the identification of endangered invasive and keystone species [3]. Additionally, it plays a crucial role in predicting and preventing infectious disease outbreaks by identifying potential disease hosts and transmitters among wild animal species [4]. In food production industries, species identification ensures authenticity, quality, and safety, preventing fraud and the circulation of substandard products [5]. In criminal and forensic cases, it aids in identifying the origin of wildlife products [6]. Traditional methods of species identification, relying on morphological characteristics, have limitations in discriminating taxa with limited morphological differences or complex phylogenetic history. DNA barcoding technology has emerged as an effective advancement to overcome these challenges [7].
DNA barcoding is a molecular technique that identifies biological species by examining distinct DNA segments, utilizing variations in short DNA sequences to provide rapid and reliable species identification [5,8,9,10,11,12]. The concept of DNA barcoding was first proposed by Paul Hebert, who suggested using a small, highly conserved genetic sequence called the “ribosomal RNA gene region” to identify species [5]. Initially, DNA barcoding was widely used in animals, where the gene-encoding cytochrome c oxidase I (COI) in mitochondria has a high species differentiation potential, especially in insects, birds, and fish [13,14]. Therefore, the COI gene has become the preferred choice for universal DNA barcoding in animals due to its high level of accuracy in species identification [15]. However, in plant mitochondrial genomes, the COI gene shows a high degree of conservation and is not suitable as a DNA barcode [16]. In addition, complex evolutionary events such as hybridization, polyploidization, and incomplete lineage sorting are more common in plants than in animals, further increasing the difficulty of screening fragments suitable for DNA barcoding [17]. Currently, the internationally recognized universal plant DNA barcodes include four gene regions, including ITS (internal transcribed spacer: internal transcribed spacer 1-5.8S-internal transcribed spacer (2), matK, rbcL, and psbA-trnH [18]. The selection of these gene regions considers the genetic diversity and evolutionary history of the plant kingdom to enhance the effectiveness and usefulness of plant DNA barcodes. However, these fragments have limitations. As an alternative, ultra-barcoding using the complete plastomes for plant species identification has been proposed [19]. Although discerning closely related species using DNA barcoding can pose challenges, this technique is promising in distinguishing morphologically indistinguishable species but genetically distinct [20].
Amomum Roxb. is the second-largest genus in the Zingiberaceae Martinov family after Alpinia and includes approximately 111 [21] to 150 [22,23] species distributed in tropical Asia and Australia, particularly in Southeast Asia, such as India, Malaysia, and Indonesia [23]. In China, Amomum comprises 39 species (29 endemic and 1 introduced) [23], mainly distributed across the Fujian, Guangdong, Guangxi, Guizhou, Yunnan, and Tibet provinces [22]. Among these, six species are listed in the Chinese Pharmacopoeia [24]. These species, originally classified within the genus Amomum, have undergone taxonomic revision [25]. These encompass (1) Lanxangia tsaoko (Crevost & Lemarié) M. F. Newman & Škorničk (synonym: A. tsaoko Crevost et Lemarie), (2) Wurfbainia compacta (Sol. ex Maton) Škorničk. & A. D. Poulsen (synonyms: A. compactum Sol. ex Maton and Zingiber compactum (Sol. ex Maton) Stokes), (3) W. longiligularis (T. L. Wu) Škorničk. & A. D. Poulsen (synonym: A. longiligulare T. L. Wu), (4) W. vera (Blackw.) Škorničk. & A. D. Poulsen (synonyms: A. krervanh Pierre ex Gagnep. and A. verum Blackw.), (5) W. villosa (Lour.) Škorničk. & A. D. Poulsen (synonyms: A. villosum Lour., Cardamomum villosum (Lour.) Kuntze, and Z. villosum (Lour.) Stokes), and (6) W. villosa var. xanthioides (Wall. ex Baker) Škorničk. & A. D. Poulsen (synonyms: A. xanthioides Wall. ex Baker, A. villosum var. xanthioides (Wall.ex Bak.) T. L. Wu & S. J. Chen, and C. xanthioides Wall. ex Kuntze) [25]. They exhibit a diverse range of traits and applications. For instance, W. compacta is a widely used culinary spice and its fruits, leaves, and seeds have a wide range of pharmacological activities in traditional medicine, such as antifungal, antibacterial, antioxidant, gastroprotective, anti-inflammatory, immunomodulatory, anticancer, antiasthmatic, and treatment of acute renal failure [26]. The fruits of W. vera have shown antibacterial activity [27]. The active ingredients in W. longiligularis and W. villosa var. xanthioides have antibacterial activity [28,29]. In addition, the powerful antioxidant properties of W. villosa var. xanthioides have been used in the treatment of non-alcoholic fatty liver disease (NAFLD) and non-alcoholic steatohepatitis (NASH) [30]. L. tsaoko has been found to contain antifungal active substances [31] and antioxidant ingredients [32], indicating its potential medicinal properties; recent studies suggested that it can relieve constipation and could be a promising candidate for developing laxatives [33]. The total flavonoids extracted from W. villosa have shown potential for developing new drugs to treat gastric cancer [34]. Chemical components found in the seeds of W. villosa can enhance cellular antioxidant activity [35]. Additionally, Chen et al. (2018) have confirmed the potential beneficial effects of W. villosa in treating inflammatory bowel disease [36]. Li et al. (2016) demonstrated that the fresh stems and leaves of W. villosa can be used as high-quality feed for cattle, sheep, and other grass-eating livestock [37]. Despite these several benefits, their morphological similarities make it challenging and confusing between species. Therefore, molecular identification through DNA barcoding is crucial for accurately identifying Amomum species.
Recent studies have utilized various universal barcodes, with the ITS sequence being approximately 500–700 bp long While the length of the ITS sequence is relatively conserved, its sequence exhibits significant variability, making it useful for species differentiation [38]. The sequencing and analysis of ITS are rapid and cost-effective, compared to traditional morphological classification methods. Additionally, an extensive repository of ITS is available in public databases, providing researchers with a wealth of reference resources that facilitate convenient species identification and classification. The GenBank database at the National Center for Biotechnology Information (NCBI) hosts an extensive collection of ITS sequences for Amomum and its taxonomic synonyms. As of April 11, 2024, it includes 572 sequences representing 159 species. This vast dataset is a valuable resource for this study, providing comprehensive and diverse information. Selvaraj et al. (2012) revealed that ITS and ITS1 are successful DNA barcodes for differentiating Boerhavia diffusa Linnaeus from counterfeit medicinal plants [39]. The ITS2 (internal transcribed spacer 2) region has been utilized for the identification of medicinal plants and their closely related species [40] in the Polygonaceae A. L. Jussieu family [41] and the Dendrobium Sw. genus [42]. ITS2 is demonstrated to be the most promising universal DNA barcode for the Zingiberaceae family [43]. The complete plastome, matK, and rbcL sequences have shown to effectively distinguish W. compacta, W. longiligularis, and W. villosa [44]. Notably, the matK and the psbA-trnH intergenic spacer exhibited high identification efficiency for L. tsaoko and other Amomum species [45]. The most efficient barcodes for the molecular identification of Amomum are ITS [46,47,48], ITS1 [49,50], and ITS2 [51,52,53]. These findings have demonstrated the promising potential application of DNA barcoding in species identification and classification within Amomum. DNA barcoding can facilitate accurate identification and classification of different Amomum species, which helps to understand their diversity and evolutionary relationships, and it is an effective tool that guides methods for Amomum’s protection, sustainable utilization, and medicinal value research.
In this study, we employed a combination of newly sequenced data and additional data from the NCBI database, including (1) ITS, (2) ITS1, (3) ITS2, (4) complete plastomes, (5) matK, (6) rbcL, (7) psbA-trnH, and (8) ycf1 to facilitate the evaluation and precise identification of six medicinal plants within the Amomum genus. Using DNA barcoding, we evaluated the value of barcodes in identifying different Amomum medicinal species, thereby reducing the potential errors associated with traditional morphological methods. Our findings will enhance the sustainable management and conservation of Amomum resources, thereby facilitating industrial growth and quality control. Ultimately, this will lead to substantial scientific and societal advantages.
2. Results
2.1. Plastome Structural Variation, Sequence Divergences, and Hypervariable Regions
All 41 individuals from the 6 examined “Doukou” species exhibited a quadripartite structure (Figure 1) and showed limited intraspecific variation in plastome size (Table S1). The complete plastomes ranged in size from 162,678 to 164,332 bp. The lengths of the large single-copy (LSC), small single-copy (SSC), and inverted repeat (IR) regions ranged from 87,632 to 89,067 bp, 14,895 to 15,754 bp, and 29,642 to 29,971 bp, respectively (Table S1). There was a slight variation in the total GC content, which ranged from 36.0% to 36.4% (Table S1). However, the GC content was higher in the IR regions (41.0–41.2%) compared to the LSC (33.7–34.1%) and SSC (29.6–30.3%) regions (Table S1). The “Doukou” plastomes were highly conserved and encode between 121 and 133 genes, including 82 to 87 protein-coding genes, 8 rRNA genes, and 30 to 38 tRNA genes (Table S1).
We compared the contraction and expansion of IRs regions at four junctions between the two IRs (IRa and IRb) and the two single-copy regions (LSC and SSC) among six “Doukou” species (Figure 2). The LSC/IRb boundary was embedded in the rpl22-rps19 region (except for W. compacta YWB91902-1 and W. vera YWB91901-1, which were directly at the rpl22 gene); the IRb/SSC and SSC/IRa boundaries were within the ycf1 gene; the IRa/LSC boundary was in the rps19-psbA region. The boundary shifts exhibited independent variations both between and within species.
The nucleotide diversity (Pi) values were calculated with DnaSP v.5 [54] to test divergence level across different regions within the complete plastomes of the six “Doukou” species and their taxonomic synonyms. The average value of nucleotide diversity (Pi) was 0.00469. The nucleotide diversity (π) value ranged from 0 to 0.02354 across the plastomes, and the most hypervariable region was ycf1 (Figure 3).
2.2. Sequence Characteristics
The matrix characteristics of ITS, ITS1, ITS2, complete plastomes, matK, rbcL, psbA-trnH, and ycf1 of six medicinal “Doukou” species and their taxonomic synonyms are listed in Table 1. ITS2 had the highest percentage of variable sites, but complete plastomes had the most variable sites. The same was true for singleton sites (Table 1). ITS1 had the highest percentage of parsimony-informative sites (Table 1).
2.3. Distance Based Species Discrimination
Analyses of intra- and interspecific Kimura 2-parameter (K2P) distances identified varying barcoding gaps among six medicinal “Doukou” species and their taxonomic synonyms across different datasets. In a barcoding gap analysis, the ITS1 and complete plastome barcodes exhibited the highest discriminatory power, successfully identifying 50% of the species (3/6; Table S2; Figure 4. The ITS and ycf1 barcode were the next most effective, identifying 33% of the species (2/6; Table S2; Figure 4). The ITS2 and psbA-trnH barcodes could only identify one species each, accounting for 17% of the species (1/6; Table S2; Figure 4). The matK and rbcL barcodes could not identify any species (Table S2; Figure 4. In the ABGD analysis, ITS and ITS1 performed best (3/6; 50%; Table S3), followed by ycf1 (2/6; 33%; Table S3); while the other five performed the worst (1/6; 17%; Table S3). The number of generated OTUs varied across the ABGD analysis with the different prior intraspecific divergence in the initial and recursive partitions (Table S3).
2.4. Tree Based Species Discrimination
In the ITS dataset, due to the abundance of sequences for W. villosa (synonyms: A. villosum, C. villosum, and Z. villosum), maximum likelihood (ML), and Bayesian inference (BI) trees were initially constructed for all individuals (Figures S1 and S2). Subsequently, three individuals from the W. villosa (A. villosum, C. villosum, and Z. villosum) branch of the ML tree were chosen to contribute to the construction of subsequent ITS, ITS1, and ITS2 trees. Similarly, in the matK (Figures S3 and S4) and rbcL (Figures S5 and S6) datasets, three individuals were selected from the same branch in the ML tree for the construction of subsequent matK and rbcL trees. In both cases, these individuals were chosen from the top, middle, and bottom of the branch to represent the full range of genetic diversity.
The ML and BI topologies derived from seven of the eight datasets for the six species were congruent in showing which species were monophyletic (Figure 5, Figure 6 and Figures S7–S18), except the ITS1 dataset, which differed from the others (Figure 7 and Figure S19). Across all datasets, including ITS, ITS1, ITS2, complete plastome, matK, rbcL, psbA-trnH, and ycf1, L. tsaoko, and all individuals of its taxonomic synonyms, formed a monophyletic group, demonstrating the successful identification of L. tsaoko (Figure 5, Figure 6, Figure 7 and Figures S7–S19). Similarly, W. compacta, along with all individuals of its taxonomic synonyms, formed a monophyletic group in the ITS, ITS1, complete plastome, and ycf1 datasets (Figure 5, Figure 6, Figure 7 and Figures S7, S10, S17 and S18). In the ITS, ITS1, and complete plastome datasets, individuals of W. vera and its taxonomic synonyms exhibited a monophyletic group (Figure 5, Figure 6, Figure 7 and Figures S7, S10 and S19). Overall, the ITS, ITS1 and complete plastome datasets can successfully identify L. tsaoko, W. compacta, and W. vera (3/6; Figure 5, Figure 6, Figure 7 and Figures S7 and S10); ycf1 can successfully identify L. tsaoko and W. compacta (2/6; Figures S17 and S18); the ITS2, matK, and psbA-trnH datasets can successfully identify L. tsaoko (1/6; Figures S8, S9 and S11–S14); the rbcL dataset cannot identify any species (Figure 8 and Figures S15 and S16). However, W. longiligularis, W. villosa, W. villosa var. xanthioides, and their individuals of taxonomic synonyms did not form monophyly in the four datasets (Figure 5, Figure 6, Figure 7 and Figures S7–S19).
3. Discussion
3.1. Plastome Characteristics and DNA Barcode Performance
The plastomes of “Doukou” were highly conserved and exhibited a typical quadripartite structure, a characteristic shared with nine species within the subfamily Zingiberoideae. [55], Zingiber Boehm. [56], various species of Curcuma L. [57], and other photosynthetic angiosperms [58,59,60]. In the six medicinal “Doukou” species, the maximum possible species discrimination was 3/6 because W. longiligularis, W. villosa, and W. villosa var. xanthioides were non-monophyletic for ITS, ITS1, plastome, and plastome-standard barcodes (Figure 5, Figure 6, Figure 7 and Figures S7–S19).
Taxon-specific markers present a feasible alternative that balances the costs of comprehensive super-barcodes, such as whole plastomes, against the limited genetic variability often found in standard barcodes. For the six medicinal “Doukou” species, we identified the most significant mutational hotspots as the ycf1 gene with a π value of 0.02354 (Figure 3), similar to other members of the Zingiberaceae family [46,57]. This was consistent with the ycf1 (Figure 8 and Figures S17 and S18), having a higher identification rate than the matK, rbcL, and psbA-trnH barcodes. The four conventional barcodes (ITS2, matK, rbcL, and psbA-trnH) were each only able to reliably identify a single species at most, so the ycf1 gene region could serve as a viable alternative for species identification for the revised Amomum species. Given the financial demands of complete plastome sequencing, this gene region can offer a cost-effective and efficient method for population genetic research on Amomum. Additionally, this approach aids in the development of a growing database for taxon-specific barcodes.
3.2. Performance Comparison of Species Delimitation Methods
Consistent with previous research [46,47,48], species delimitation outcomes vary with the data and methodologies applied. Among the methods evaluated—ABGD, BG, BI, and ML—ML stood out as the most effective species identification, closely followed by BI, as illustrated in Figure 8. Additionally, the topological structures produced by ML and BI are largely similar, suggesting that these methods consistently achieve the highest identification rates (Figure 8). While the identification rates for ABGD and BG differ, ABGD generally outperforms BG (Figure 8), leading to a method ranking of ML > BI > ABGD > BG. Given the demonstrated robustness and efficiency of ML and BI in this study, these methods are recommended as the preferred approaches for species delimitation in DNA barcode-based identification, particularly when employing super-barcodes.
3.3. DNA Barcoding in Six Medicinal “Doukou” Plants
Previous studies have indicated that the standard barcodes are not sufficient for the identification of several medicinal plants within the genus Amomum [44,48,49,50,51,53,61]. The complete plastomes have demonstrated a strong capability to differentiate species of Amomum [44]. The results of this study have further validated these findings. The ability to distinguish species of Amomum is enhanced by the length of the complete plastome sequence, which is approximately 160,000 bp long, and its inclusion of many informative sites. However, the sequencing and analysis of the complete plastomes are considerably more expensive and resource-intensive than short fragments such as ITS sequences. ITS sequencing is more cost-effective and demands fewer computational resources for analysis. Despite its relatively short length of approximately 600 bp, the informative sites within the ITS region can accurately distinguish among L. tsaoko, W. compacta, and W. vera, similar to the capabilities of ITS1. ITS2 can only successfully identify L. tsaoko. Although ITS2 contains the highest proportion of variable sites, the complete plastomes hold numerous variable sites (Table 1).
Previous studies have shown that the identification rate of ITS/ITS1/ITS2 is higher compared to the plastome fragments [51,62,63]. This may be because the plastome only contains maternal genetic information [63], while ITS/ITS1/ITS2 contain richer biparental genetic information [64]. ITS sequences typically have multiple copies, which may increase ITS variability and improve the accuracy of species identification. Conversely, plastome fragments may only have a single copy, limiting their identification capabilities in some taxa. Notably, some taxa may contain hybrid or show hybridization leading to difficulties in species identification using plastome fragments. In this case, ITS sequences may better reflect the genetic differences between such species, thereby improving the identification rate.
In the eight datasets, some individuals were placed within monophyletic groups (Figure 5, Figure 6, Figure 7 and Figures S1–S17), possibly due to misidentification. The inclusion of some non-target species individuals in the monophyletic branches might be due to errors in species identification, given that the NCBI database has an extensive range of sources. Previously, several studies solely relied on distance for species identification. However, subsequent research has indicated that the barcode gap may be a result of errors in under-sampled taxonomic groups [65]. Therefore, when carrying out species identification, it is important to incorporate extensive analyses. The relationship between the minimum interspecific distance and the maximum intraspecific distance among the six species, along with the consistency between the ABGD grouping results and the tree results, provides strong evidence to support the species identification and classification of these species.
3.4. NCBI Database as a Resource and ITS vs. ITS1 Identification Rate
The NCBI database has provided comprehensive biological and biomedical information [66], offering a vast collection of genetic sequences, gene expression data, protein structures, and scientific literature. Its user-friendly interface and open-access policy promote global scientific collaboration. However, challenges include navigating through the extensive data and ensuring the quality and accuracy of the information due to varying submissions from researchers and institutions. This research was conducted based on a large amount of NCBI data, and reliable results were obtained. The NCBI database provides great convenience for research.
The ITS region has been proposed as a standard DNA barcode marker in fungi [67] and plants [68]. In our study, the identification rate of ITS1 was higher than that of ITS2, which aligned with the view that ITS1 is a better barcode than ITS2 in eukaryotes [69]. Despite the evaluation of ITS1 and ITS2 as meta-barcode markers for fungi [70], their identification efficacy as DNA barcode markers varies across different taxa. In this study, the individuals used across the ITS, ITS1, and ITS2 datasets are consistent. Therefore, this research serves as a reference, suggesting that the ITS1 dataset might be considered first in practical applications when the experimental individuals are identical.
Although ITS is significantly longer than ITS1, there is no noticeable difference in the difficulty of amplification between the two. Even though the ITS dataset had more variable sites than ITS1, it did not necessarily mean that it surpasses ITS1 in identification rate. Notably, the percentage of variable sites within the ITS1 sequence is higher than that of variable sites within the entire ITS sequence. Many factors can affect the identification rate, including the presence of key variable sites that can significantly distinguish species, the amount of recognizable feature information within the dataset, and the size of the differences in the species being identified. In these aspects, the ITS1 dataset may be superior to ITS and, thus, have a higher identification rate.
The ITS1 sequence is approximately 100–200 bp in length. In most cases, it can be easily acquired through sequencing of polymerase chain reaction (PCR) amplicons. This process is cost-effective and can easily be obtained. Furthermore, abundant ITS sequences of the genus Amomum and its taxonomic synonyms can be directly extracted from the NCBI database. Through multiple analyses in this study, it has been mutually verified that ITS1 has the highest identification rate. This suggests that in future identification of these six medicinal “Doukou” plants, ITS1 should be considered first.
4. Materials and Methods
4.1. Taxon Sampling
Based on the phylogenetic relationships of the genus Amomum established by Boer et al. [25], we selected close relatives of six target species for our study. We consulted this study to identify the accepted names and synonyms for these six target species and their closely related species. Our data collection and analysis focused on these target species, their close relatives, and the taxonomic synonyms associated with both groups. We sampled 11 individuals from Wurfbainia and Lanxangia genera (Table 2), as well as numerous individuals represented by ITS, complete plastome, matK, rbcL, psbA-trnH, and ycf1 sequences from both Wurfbainia and Lanxangia and their taxonomic synonyms available on NCBI (Table S4). To download the second-generation sequencing data within these groups, we utilized the prefetch tool in SRA Toolkit v.3.1.0, accessible at https://github.com/ncbi/sra-tools, accessed on 4 March 2024, from the NCBI database. The cut-off date for downloading data from NCBI was 11 April 2024. The detailed species information that was sequenced is listed in Table 2, and all the information was uploaded to the NCBI GenBank database. Alpinia nigra (Gaertn.) Burtt (MF076960) and Alpinia galanga (L.) Willd. (AF478715) were chosen as outgroups for constructing the matrices of ITS, ITS1, and ITS2 sequences. For the complete plastome, matK, rbcL, psbA-trnH, and ycf1 matrices, A. nigra (MK940826) and A. galanga (MK940825) were selected as outgroups. The selection of outgroups was based on Gong et al. [53]. We downloaded 232, 31, 138, 224, 53, and 31 sequences of ITS/ITS1/ITS2, complete plastome, matK, rbcL, psbA-trnH, and ycf1, respectively, from NCBI (Table S4). Numerous sequences of W. villosa (synonyms: A. villosum, C. villosum, and Z. villosum) were recovered for the ITS, matK, and rbcL datasets. Initially, we constructed phylogenetic trees using all available data and subsequently selected three individuals from the W. villosa clade within the tree based on genetic distance for further analysis.
4.2. DNA Extraction, Sequencing, Assembly, and Annotation
We extracted total DNA from 0.2 g of the gel-dried leaves and herbarium samples using the modified 4 × CTAB method [71]. The quality of DNA was assessed using 1% agarose gel electrophoresis and a NanoDrop^®^ ND-1000 spectrophotometer. We constructed a DNA library (300–500 bp) using the NEBNext UItra II DNA library prep kit for Illumina and performed two-end sequencing (2 × 150 bp) on the DNBSEQ-T7 high-throughput platform, generating a total amount of data of no less than 3 Gb. The length of single-ended sequencing reads was 150 bp (sequencing strategy PE150). To convert SRA files downloaded from NCBI into FASTQ format, we used fasterq-dump from SRA Toolkit v.3.1.0 (https://github.com/ncbi/sra-tools, accessed on 4 March 2024). Then, we compressed the ‘fastq’ files into ‘fastq.gz’ format suitable for GetOrganelle assembly using the open-source tool pigz v. 2.2.5 (https://zlib.net/pigz/, accessed on 4 March 2024).
The ITS sequence, spanning approximately 600–700 bp, was first assembled utilizing GetOrganelle v.1.7.5.3 [72]. Following assembly, the resultant FASTG file and the reference from A. sericeum Roxb. (KY438097.1) were aligned using the Map function in Geneious v.9.0.2 [73] to prepare the sequence for annotation. Subsequently, annotation was performed through Geneious v.9.0.2 [73] with the reference to acquire the ITS sequence. ITS1/ITS2 sequences were then extracted based on annotation information using Geneious v.9.0.2 [73].
The plastome assembly and annotation methods of sequences were conducted following the protocol described by Li et al. [74]. The clean data obtained from high-throughput sequencing were directly assembled using GetOrganelle v.1.7.5.3 [72], and the complete circular plastid genome was automatically generated. In cases where the circular structure could not be obtained, results were visually inspected using Bandage v.0.8.1 [75]. Subsequently, reliable plastid genome contigs or scaffolds were identified by manually removing non-target contigs from the ‘fastg’ file. The selected sequences were manually edited and spliced to obtain a complete plastid genome. Annotation of the plastid genome was performed using Geneious v.9.0.2 [73], with the published genome of A. krervanh (NC_036935.1) as the reference, and then combined with ORF (open reading frame) for correction. The matK, rbcL, psbA-trnH, and ycf1 were extracted using Geneious v.9.0.2 [73] based on annotation information.
The ITS, ITS1, ITS2, complete plastome, matK, rbcL, psbA-trnH, and ycf1 matrices were constructed by aligning the sequences using the Mafft Multiple Alignment plugins in Geneious v.9.0.2 [73]. All annotated sequences were uploaded to GenBank, and accession numbers were assigned (Table 2).
4.3. Data Analysis
4.3.1. Plastome Structural Variation, Divergence, and Mutational Hotspot Analyses
In this study, we conducted a detailed examination of 41 plastomes from six medicinal “Doukou” species and their taxonomic synonyms, focusing on aspects such as genome size, gene content, which includes protein-coding genes, tRNAs, rRNAs, and GC content. We utilized Geneious v.9.0.2 [73] for comparative analyses to investigate the expansion and contraction dynamics of the inverted repeats (IRs) at the four junctions of these plastomes, with visualization facilitated by IRscope [76]. Furthermore, by employing a sliding window analysis using DnaSP v.5 [54], with settings adjusted to a step size of 200 bp and a window length of 600 bp, we successfully pinpointed the top three sequences as the most variable regions. To complement our research findings, we constructed a detailed physical circular map of the plastome using OGDRAW v.1.3.1 [77].
4.3.2. Sequence-Based Analyses
We conducted a distance-based analysis using matrices generated from a subset of target and closely related species individuals selected from all individuals of the Wurfbainia and Lanxangia genera and their taxonomic synonyms for tree construction according to Boer et al. [25]. Two primary species delimitation approaches were employed: barcoding gaps (BG) [78] and automatic barcode gap discovery (ABGD) [79]. To investigate the existence of barcoding gaps within each dataset (ITS, ITS1, ITS2, complete plastome, matK, rbcL, psbA-trnH, and ycf1), we conducted pairwise distance calculations implemented in MEGA-11 [80] using the K2P model. A scatter plot was employed to identify barcoding gaps by visualizing the relationship between the minimum interspecific distance and maximum intraspecific distance for the six species and their taxonomic synonyms. A species is considered accurately identified when the minimum interspecific distance is larger than its maximum intraspecific distance [81]. The ABGD analysis was conducted using an online platform (https://bioinfo.mnhn.fr/abi/public/abgd/, accessed on 4 March 2024), employing three distinct distance models: Jukes–Cantor [JC69], Kimura [K80] TS/TV 2.0, and simple distance. The analysis was configured with the following parameters: Pmin = 0.001, Pmax = 0.1, Steps = 10, X = 1.5, and Nb bins = 20. The best partition was identified as the one most closely aligning with the delimitation of nominal species among the partitions obtained.
4.3.3. Phylogenetic Tree-Based Analyses
We constructed phylogenetic trees based on ML and BI methods from eight datasets: (1) ITS, (2) ITS1, (3) ITS2, (4) complete plastome, (5) matK, (6) rbcL, (7) psbA-trnH, and (8) ycf1 sequences. The sequence matrices of each dataset were aligned using MAFFT implemented in Geneious v.9.0.2 [73]. The ML tree was constructed using RAxML v.8.2.11 [82] by the GTRGAMMAI model with 1000 rapid bootstrap replicates. MrBayes v.3.2.7 [83] was utilized for BI analyses runs with 1,000,000 generations, employing the best-fit model specified according to the optimal scheme selected by jModeltest v.2.1.7 [84] using the Akaike information criterion (AIC) criteria. Phylogenetic trees were then visualized by tvBOT v.3.0 [85]. Successful identification was considered when all individuals of the same species and their synonyms cluster into a single clade.
5. Conclusions
In this study, we examined the structural variations in plastomes and assessed the effectiveness of both standard and super DNA barcodes in species identification, focusing on intraspecific and interspecific variability within six medicinal “Doukou” species. Molecular identification of these Amomum species was achieved through the analysis of wide genetic markers, including ITS, ITS1, ITS2, complete plastome, matK, rbcL, psbA-trnH, and ycf1 sequences. Among the markers employed, ITS, ITS1, and complete plastome were highly effective in identifying L. tsaoko, W. compacta, and W. vera. The ycf1 barcode proved useful for identifying L. tsaoko and W. compacta. In contrast, ITS2, matK, and psbA-trnH were specifically effective only for identifying L. tsaoko. Conversely, rbcL was ineffective in distinguishing any of the species. In conclusion, the ITS, ITS1, and complete plastomes performed best followed by ycf1, and then ITS2, matK, and psbA-trnH, while rbcL performed worst with insufficient sites to discriminate any species. Consequently, considering factors such as cost-efficiency, ITS1 emerges as the most recommended marker for molecular identification within the Amomum genus. The methodologies utilized herein for the molecular identification of the six medicinal “Doukou” species form a basis for the conservation of wild plant resources, the rational utilization of medicinal plants, and the prevention of resource misappropriation. This study provides essential molecular tools for the precise identification of species, hence enhancing our understanding of the botanical and pharmacological aspects of “Doukou” medicinal plants.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bickford D. Lohman D.J. Sodhi N.S. Ng P.K.L. Meier R. Winker K. Ingram K.K. Das I. Cryptic species as a window on diversity and conservation Trends Ecol. Evol.20062214815510.1016/j.tree.2006.11.00417129636 · doi ↗ · pubmed ↗
- 2Gotelli N.J. Colwell R.K. Quantifying biodiversity: Procedures and pitfalls in the measurement and comparison of species richness Ecol. Lett.2001437939110.1046/j.1461-0248.2001.00230.x · doi ↗
- 3SouléM.E. Wilcox B.A. Conservation Biology. An Evolutionary-Ecological Perspective Addison-Wesle London, UK 1980395
- 4Smith K.F. Behrens M. Schloegel L.M. Marano N. Burgiel S. Daszak P. Reducing the risks of the wildlife trade Science 200932459459510.1126/science.117446019407185 · doi ↗ · pubmed ↗
- 5Hebert P.D. Cywinska A. Ball S.L. De Waard J.R. Biological identifications through DNA barcodes Proc. R. Soc. Lond. B Biol. Sci.200327031332110.1098/rspb.2002.2218 PMC 169123612614582 · doi ↗ · pubmed ↗
- 6Linacre A. Gusmão L. Hecht W. Hellmann A.P. Mayr W.R. Parson W. Prinz M. Schneider P.M. Morling N. ISFG: Recommendations regarding the use of non-human (animal) DNA in forensic genetic investigations Forensic Sci. Int. Genet.2011550150510.1016/j.fsigen.2010.10.01721106449 · doi ↗ · pubmed ↗
- 7Kress W.J. Erickson D.L. DNA barcodes: Genes, genomics, and bioinformatics Proc. Natl. Acad. Sci. USA 20081052761276210.1073/pnas.080047610518287050 PMC 2268532 · doi ↗ · pubmed ↗
- 8Hebert P.D. Gregory T.R. The promise of DNA barcoding for taxonomy Syst. Biol.20055485285910.1080/1063515050035488616243770 · doi ↗ · pubmed ↗
