Identification of Insertion and Deletion (InDel) Markers for Chickpea (Cicer arietinum L.) Based on Double-Digest Restriction Site-Associated DNA Sequencing

Duygu Sari

PMC · DOI:10.3390/plants13172530·September 9, 2024

Identification of Insertion and Deletion (InDel) Markers for Chickpea (Cicer arietinum L.) Based on Double-Digest Restriction Site-Associated DNA Sequencing

Duygu Sari

PDF

Open Access

TL;DR

This study identifies and develops InDel markers in chickpea using ddRADSeq data, providing useful tools for chickpea breeding and genomics.

Contribution

The study introduces a large set of InDel markers for chickpea, developed using ddRADSeq data and optimized for breeder-friendly genotyping.

Findings

01

Bioinformatic analysis identified 20,700 InDel sites in chickpea accessions using ddRADSeq data.

02

Most InDel markers were found in intergenic regions, and polymorphic information content ranged from 0.09 to 0.37.

03

29 InDel regions were selected and primers were successfully designed for genotyping.

Abstract

Enhancing the marker repository and the development of breeder-friendly markers in chickpeas is important in relation to chickpea genomics-assisted breeding applications. Insertion–deletion (InDel) markers are widely distributed across genomes and easily observed with specifically designed primers, leading to less time, cost, and labor requirements. In light of this, the present study focused on the identification and development of InDel markers through the use of double-digest restriction site-associated DNA sequencing (ddRADSeq) data from 20 chickpea accessions. Bioinformatic analysis identified 20,700 InDel sites, including 15,031 (72.61%) deletions and 5669 (27.39%) insertions, among the chickpea accessions. The InDel markers ranged from 1 to 25 bp in length, while single-nucleotide-length InDel markers were found to represent the majority of the InDel sites and account for 79% of…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Cicer arietinum

Chemicals1

agarose

Figures3

Click any figure to enlarge with its caption.

Funding1

—Akdeniz University Scientific Research Project Coordination Unit

Keywords

chickpeainsertiondeletionmarkerddRADSeq

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic and Environmental Crop Studies · Agricultural pest management studies · Legume Nitrogen Fixing Symbiosis

Full text

1. Introduction

Chickpea (Cicer arietinum L.) is one of the most significant legume crops worldwide, although it is mainly produced in India, Australia, Ethiopia, Turkey, and Myanmar [1]. Chickpeas are high in carbohydrates (60–65%), plant-based protein (20–22%), fat (6%), and dietary fiber, especially the soluble fiber raffinose [2]. They also contain several key vitamins and minerals, such as potassium, B-complex vitamins, iron, magnesium, and selenium [3]. Chickpeas are self-pollinated and diploid crops (2n = 2x = 16), and they have a relatively small genome size of ~740 Mbp [3]. Given the ongoing impacts of climate change, it is crucial to enhance agricultural productivity to ensure food security. Thus, the primary objective of plant breeding is to improve high-yield and high-quality varieties. Molecular-marker-assisted breeding helps to achieve these goals in a rapid and efficient manner. Indeed, molecular markers are efficient tools for biodiversity studies, segregation analyses, construction of genetic physical and genetic maps, and transcript profiling [4]. To date, random amplified polymorphic DNA (RAPD) [5,6], amplified fragment length polymorphism (AFLP) [7,8], inter-simple sequence repeat (ISSR) [9,10], and internal transcribed spacer (ITS) [11] markers have been studied in chickpeas. Moreover, in recent years, the development of high-throughput genotyping (or next-generation sequencing [NGS]) has prompted the discovery of high-quality genome-derived simple sequence repeat (SSR) and single-nucleotide polymorphism (SNP) markers in various natural and mapping populations of chickpeas [12,13,14,15]. SSRs are highly informative, abundant in the genome, easy to use, multiallelic, and locus-specific, and they have been widely utilized for plant-breeding applications due to their co-dominance and highly reproducible nature [16,17,18]. However, the unusually high variability and the presence of null alleles do not necessarily reflect patterns of genome-wide genetic diversity [19,20]. In addition, in recent years, SNPs have become important in genetic and genomics applications. SNPs are bi-allelic, co-dominant, and abundantly present in the genome [21,22]. Although several SSR and SNP markers have been developed in chickpeas [10,14,15,23,24,25], the narrow genetic base of cultivated chickpeas might limit the use of these molecular markers in revealing polymorphisms among the genotypes [26], which has inspired efforts to identify alternative markers for chickpea genomics-assisted breeding applications.

Insertion–deletion (InDel) markers are among the main sources of natural variation, and they are widely distributed across the genomes of different plants [27]. InDel markers generally appear due to the movement of transposable elements, unequal crossing over, or replication errors [28,29]. These InDel markers are a preferred alternative to sequence-based markers for genomics-assisted breeding applications. This is due to the desirable genetic features of InDel markers, which SSRs and SNPs also possess [30]. In fact, InDel markers are abundant and widely distributed in the genome, and they are also easily observed with specifically designed primers via simple polymerase chain reaction (PCR) systems and agarose gel electrophoresis, leading to less time, cost, and labor requirements [31,32,33]. So far, InDel markers have been used in some legumes, including common beans [29], lentils [34], and peanuts [35]. However, a small number of studies have investigated the development of InDel markers in chickpeas [27,31,36].

With the aim of enhancing the marker repository and the development of breeder-friendly markers in chickpeas, the present study focused on the identification and development of InDel markers through the use of double-digest restriction site-associated DNA sequencing (ddRADSeq) data from 20 chickpea accessions. The resultant markers were also tested on chickpea germplasm to evaluate their efficiency.

2. Results

A total of 154.9 M raw sequence reads (with a mean of 7.74 M) were generated using the Illumina HiSeq platform for the 20 chickpea accessions (Table 1). The guanine–cytosine (GC) content of the reads was 33%. The highest number of reads was 11.7 M in ICC552, whereas the lowest was 2.5 M in CA2969 (Figure 1). An overall alignment rate of 97.13% was obtained from the ddRADSeq reads mapped to the chickpea reference genome. Using the variant calling pipeline, 20,700 InDel sites, including 15,031 (72.61%) deletions and 5669 (27.39%) insertions, were identified among the accessions (Table 2). The InDel markers ranged from 1 to 25 bp in length. Single-nucleotide-length InDel markers were found to represent the majority of InDel sites and to account for 79% of the total InDel markers. However, we focused on the InDel markers wherein the length was greater than a single nucleotide to avoid any read or alignment errors. Considering all of the InDel markers, 96.1% were lower than 10 bp, 3.6% were between 10 and 20 bp, and 0.3% were higher than 20 bp in length.

All of the InDel markers were distributed across the eight chromosomes of chickpea, with the highest number being located in chromosome 4 and the lowest number in chromosome 8 (Table 3). The greatest number of insertions and deletions occurred in chromosome 4 and chromosome 6, respectively, whereas the smallest numbers of insertions and deletions were both seen in chromosome 8. The frequency of the InDel markers ranged from 5.49 InDel/Mb (chromosome 8) to 17.92 InDel/Mb (chromosome 4). In light of their genomic distribution and the simple visualization on agarose gels, we examined InDel regions of 10 bp and longer for the identification of InDel markers (Table 4).

There were 689 InDel regions, including 570 (83%) insertions and 119 (17%) deletions, with a length >10 bp examined in the chickpea genome (Table 4). Chromosome 4 exhibited the greatest number of insertions (109) and deletions (25), whereas the smallest number of insertions was observed in chromosome 1 and the smallest number of deletions was noted in chromosome 8. The chromosomal position, size, and sequence information of some insertions and deletions are provided in Tables S1 and S2, respectively. The greatest insertion (25 bp) was identified in chromosome 4 (physical position: 7271444), followed by chromosome 4 (position: 6839631) with 24 bp and then chromosomes 5 (physical position: 44703336), 6 (physical position: 7658552), and 7 (physical position: 33061740) with 23 bp (Table S2). The longest deletion (25 bp) was located in chromosome 2 (position: 7929861) (Table S2).

In total, 29 InDel markers were successfully designed in this study. The InDel markers were dispersed across all eight chromosomes. Figure S1 shows the PCR amplicons generated with the newly designed primers. Annotation analysis of the InDel markers revealed their highest frequency in the intergenic regions (82.76%), followed by the introns (6.90%), coding sequences (CDSs) (6.90%), and exons (3.45%) (Table S3). Moreover, the locus CA-D-5-385 was found to be related to the Callose Synthase 10 (CalS10) gene.

For the genetic diversity analysis of the 20 chickpea varieties, 21 InDel markers were tested in agarose gel, and the bands were scored according to the allele sizes (Table 5). A total of 42 alleles were observed among the accessions. All of the InDel markers exhibited two alleles. The Ne varied between 1.10 and 2.00. Moreover, the highest Ne was obtained in the loci CA-D-2-357 and CA-D-4-414. The average He and uHe values were 0.25 ± 0.04 and 0.26 ± 0.04, respectively. The average I was 0.39 ± 0.04, and the highest I (0.69) was obtained in the loci CA-D-2-357, CA-D-4-414, and CA-I-5-413. In addition, the polymorphism information content (PIC) varied from 0.09 to 0.37, with an average of 0.20. The loci CA-D-2-357, CA-D-4-414, CA-I-4-408, and CA-I-5-413 presented the highest PIC (0.37), followed by CA-I-6-531 (0.36) and CA-D-3-394 (0.35).

According to the PCoA of the molecular data, 25.33% and 18.23% of the total variation was explained by the first and second axis, respectively. Furthermore, the PCoA plots of PC1 versus PC2 using the factorial analysis of GenAlEx showed that the chickpea accessions from the different countries could be divided into three groups without considering their geographic origin (Figure 2). The neighbor-joining tree consisting of 20 chickpea genotypes was constructed with newly developed InDel markers (Figure 3). The tree showed three distinct clusters. Cluster I had nine accessions, mostly originating from Asia. Cluster II was also divided into two sub-clusters and included ten accessions from Asia, America, and Europe. Cluster III had only one accession from Europe.

3. Discussion

Climate change is associated with devastating environmental impacts, including extreme weather events (rainfall, droughts), rising sea levels, soil acidification, and the emergence of new pathogenic strains [37,38]. To help overcome these impacts, plant breeders have sought to exploit the natural variations in germplasm resources to develop climate-resilient crops. Molecular markers are important tools for characterizing genetic variation [39]. In the last few decades, the development of PCR-based markers, such as RAPD, AFLP, SSR, and SNP, has served to illuminate the genetic resources of chickpeas [6,8,15]. However, the narrow genetic base of the cultivated chickpea has encouraged breeders to identify alternative markers [40]. In addition, in terms of NGS-accelerated marker identification, ddRADSeq is an NGS protocol based on the development of multiplexed libraries by using two restriction enzymes for the genome complexity reduction [41]. In this study, we identified a total of 20,700 InDel sites using ddRADSeq, which indicated the suitability of this method for identifying InDel regions in the chickpea genome. The identified InDel markers varied among the chromosomes, which further confirmed the effectiveness of this method for genome-wide marker identification in chickpeas. Moreover, the identified markers may be used for biodiversity studies, segregation analyses, and the construction of genetic physical and genetic maps.

Based on the insertions and deletions, different InDel sites were identified among the accessions. The most prevalent type was the single-nucleotide InDel site, similar to the studies conducted in chickpea [31]. An increase in the InDel size caused a decline in the abundance of the InDel markers (Table 2). A negative relationship was observed between the InDel size and abundance, which has previously been reported by Jain et al. [31]. We did not observe any insertions or deletions longer than 25 bp. This could be a flaw on the part of ddRADSeq, as sequencing library preparation using this method can create sequence gaps during assembly [36]. Yang et al. [42] reported that InDel markers longer than 30 bp did not always cause more polymorphism. Our InDel frequency was determined with a density of 35.75 InDel/Mb in the whole genome (20,700 InDel sites in a genome size of 740 Mbp), similar to the frequency observed by Das et al. [27] but lower than the frequency obtained by Jain et al. [31]. The sequencing method, number of genotypes, or bioinformatic parameters applied during the variant calling might have caused these differences in the InDel frequency. When compared with other markers in chickpea, the number of InDel markers was less than the number of SNP markers [24,25,43] and more than the number of SSRs [44], which indicated that the efficiency of the InDel and SNP marker discovery in chickpea was higher than that associated with SSR markers when using NGS. However, due to the cost-effective and simple gel electrophoresis procedures, InDel markers are considered a viable alternative to SNP markers, which are relatively expensive and require a complex platform for genotyping.

All of the identified InDel markers were distributed across the eight chromosomes of chickpea. The greatest number of insertions and deletions was observed in chromosome 4 and chromosome 6, respectively, whereas the smallest numbers of insertions and deletions were both found in chromosome 8. These findings confirmed those of prior studies conducted in chickpeas, which described a large number of markers (SNPs or InDel markers) in chromosome 4 [36,45,46,47,48].

In terms of breeding programs, gel-based markers with breeder-friendly genotyping appear to be a better alternative than SNP or KASP markers when considering the available NGS technologies [49]. This led us to develop 29 agarose-resolvable markers that resulted in successful polymorphic bands among the accessions. Eventually, this provided an effective method for both ddRADSeq library preparation and scripts for InDel identification, resulting in 100% PCR efficiency. In addition, annotation analysis revealed the highest frequency of InDel markers in the intergenic regions (82.76%) (Table S3), with similar results having been observed in different crops [50,51,52]. Moreover, Parida et al. [53] recommended the applicability of polymorphic markers obtained from non-CDS components of chickpeas.

Exploiting genetic diversity is very important when it comes to using and conserving genetic resources and developing new breeding strategies. In the current study, the efficiency of the newly developed InDel markers was evaluated in chickpea germplasm, including twenty accessions from nine different regions in Africa, North America, Asia, and Europe. Analysis of the genetic diversity revealed that the number of Ne varied between 1.10 and 2.00. The average He and uHe values were 0.25 and 0.26, respectively. The average I was 0.39, which is greater than the value obtained by Aggarwal et al. [10]. The high values of genetic diversity parameters, such as Ne, He, uHe, and I, support the validity of these molecular markers for evaluating the genetic diversity of chickpea. The average PIC value of the 21 markers was 0.20, which was greater than the PIC values of the ISSR (0.125) [54] and SNP (0.12) [55] markers, and it was used to develop InDel markers and illuminate the genetic diversity of chickpea. Previous studies also reported higher PIC values for InDel markers in chickpea [36], rice [56], and radish [57]. However, the low PIC value range of the markers was not unexpected, given the limited genetic variation in the chickpea gene pool [58]. Among our results, there were nine InDel markers with PIC values greater than 0.25, accounting for 33.3% of the overall InDel markers. Botstein et al. [59] categorized the PIC values of markers as highly informative (≥0.5), reasonably informative (0.50–0.25), or least informative (≤0.25). Based on this characterization, nine of the markers developed in this study were reasonably informative. The PCoA results and phylogenetic analysis obtained using the 21 InDel markers indicated that the chickpea accessions were uniformly distributed in different groups regardless of their geographic origins. Hence, our findings are similar to those of previous research using RAPD markers [5], where a clear relationship among the accessions was not found.

4. Materials and Methods

4.1. Plant Material

A total of twenty chickpea accessions from nine different regions in Africa, North America, Asia, and Europe were evaluated for the ddRADSeq analysis in this study (Table 1). The highest number of accessions was from India (8), followed by Spain (3), Turkey (2), the Russian Federation (2), Mexico (2), Iran (1), the United States (1), and Ethiopia (1). The seeds of each accession in the collection were sown in pots for the DNA analysis.

4.2. DNA Extraction

The total genomic DNA was extracted from young leaves of the plants using the cetyltrimethylammonium bromide (CTAB) method described by Doyle and Doyle [60] with minor modifications, such as the use of extra chloroform–isoamyl alcohol and 70% ethanol cleaning steps to increase DNA purity. The DNA quality and quantity were checked by means of electrophoresis on 1% agarose gels, and the amount was normalized to 100 ng/μL using lambda DNA as the reference.

4.3. Library Preparation, Sequencing, and InDel Calling

The ddRADSeq library was prepared using a modified version of the method reported by Peterson et al. [41], where a reduced representative genomic library was prepared using restriction enzymes (VspI and MspI) via a modification to the ddRAD method. Briefly, each sample (200 ng DNA) was digested with two restriction enzymes—namely, VspI and EcoRI. Ampure XP beads (Beckman Coulter Genomics, Indianapolis, IN, USA) were used to clean the digestion products, and then P1 and P2 adapter ligations were carried out using T4 ligase buffer. To create an overhang in the VspI restriction cut site, the 3′ end of the P1 adapter was modified. After ligation, 15 cycles of PCR amplification with genotype-specific indexed primers were performed. The PCR products were visualized on agarose gel and then combined and equalized in concentration. The genomic library with an insert size of 400–500 bp was run on the Illumina HiSeq platform (Illumina, San Diego, CA, USA) using the 2 × 150 bp paired-end sequencing protocol. The ddRAD sequencing data of the 20 accessions were deposited in the National Center for Biotechnology Information (NCBI)’s Sequence-Read Archive (SRA) database under accession number PRJNA1064701.

For the bioinformatic analysis, the raw data were demultiplexed with Je software (v1.2) [61]. Quality control and preprocessing of the FASTQ files were performed using fastp [62], and the reads were trimmed by removing those bases with an average Phred score of less than 15. The cleaned data were mapped to the kabuli reference genome 1.0 [3] using the Bowtie 2 aligner (v2.2.6) [63]. Variant calling was performed in Freebayes (Galaxy Version 1.1.0.46-0) [64] with the genotype-specific individual alignment files in BAM format by selecting the following parameters: simple diploid calling with filtering and coverage of 20×. The SNPs were removed from the variant files using VCFfilter (Galaxy Version 1.0.0). The separate.vcf files containing the insertions and deletions were merged into a single data file. The combined variant file was organized in Microsoft Excel 2016 to remove any duplicated regions and arrange the InDel markers based on their sizes. InDel regions of 10 bp and longer were visualized with Integrated Genome Browser (IGB) V9.1.4 [65] using the BAM files of the genotypes and the chickpea reference genome.

4.4. Identification of InDel-Flanking Sequences and Primer Design

To develop the genome-wide InDel markers, the flanking sequences of the identified InDel markers were extracted as the target sequence based on the chickpea reference genome using IGB software V9.1.4. For the design of the forward and reverse primers, Primer3Plus (https://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi (accessed on 15 March 2024)) [66] was used with the following characteristics: primer length of 18–27 nucleotides, melting temperature of 53–62 °C, GC content of 40–60%, and PCR products of 150–500 bp long. The designed primers were later controlled for possible matches with other loci in the genome. All of the markers were named using the CA-D(I)-X-XXX format, where “CA” stands for chickpea; “D” and “I” stand for deletion and insertion, respectively; “X” stands for the chromosome number; and “XXX” stands for the start of the chromosomal position.

4.5. PCR Amplification

A total of 21 InDel markers, with around 3 markers evenly distributed on each chromosome, were selected from the designed primer pairs to be validated on 20 chickpea germplasms. For the PCR analysis, a total volume of 20 of reaction mix was used, which included 1 μL of genomic DNA, 1 μL of 10 × PCR buffer, 2.5 mM MgCl_2_, 0.3 μL of 10 mM dNTP mix, 0.3 μL of each primer (10 μM), 0.2 μL of Taq DNA polymerase (5 U/μL), and double-distilled H_2_O. The PCR reaction was conducted under the following conditions: 95 °C for 2.5 min, followed by 4 cycles of 95 °C for 45 s, 50 °C for 20 s, and 60 °C for 50 s; 30 cycles of 92 °C for 20 s, 50 °C for 20 s, and 60 °C for 50 s; and extension at 60 °C for 10 min [67]. The PCR products were separated on 3% agarose gels, visualized by means of ultraviolet (UV) light, and recorded as codominant data, with genotypes listed by fragment size.

4.6. Genetic Diversity Analysis

The calculations of the population genetic parameters, number of alleles (Na), number of effective alleles (Ne), Shannon diversity index (I), expected heterozygosity (He), unexpected heterozygosity (uHe) were performed. Principal coordinate analysis (PCoA) was performed using the GenAlEx V6.5 package [68], and the phylogenetic tree was constructed in DARwin version 5.0 software (https://darwin.cirad.fr/product.php (accessed on 10 May 2024)) using the neighbor-joining (NJ) method and modified in FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree (accessed on 26 May 2024)).

5. Conclusions

The development of NGS technologies has prompted the discovery of high-quality genome-derived markers. InDels are preferred alternative sequence-based markers for genomics-assisted breeding applications. This is due to their desirable genetic features, which SSRs and SNPs also possess. In addition, they can be used in regular laboratories, and they can easily observed with designed primers with less time, cost, and labor via simple PCR systems and agarose gel electrophoresis. In this study, we identified 20,700 InDel sites and developed 29 markers that might play an important role in chickpea genetic and genomic studies. The efficiency of these markers was also tested on 20 different chickpea accessions.

Bibliography68

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1FAO FAO Statistical Databases (FAOSTAT)Available online: https://www.fao.org/faostat/en/#data/QCL(accessed on 17 November 2022)
2Elango D. Wang W. Thudi M. Sebastiar S. Ramadoss B.R. Varshney R.K. Genome-wide association mapping of seed oligosaccharides in chickpea Front. Plant Sci.202213102454310.3389/fpls.2022.102454336352859 PMC 9638045 · doi ↗ · pubmed ↗
3Varshney R.K. Song C. Saxena R.K. Azam S. Yu S. Sharpe A.G. Cannon S. Baek J. Rosen B.D. Tar’An B. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement Nat. Biotechnol.20133124024610.1038/nbt.249123354103 · doi ↗ · pubmed ↗
4Singh H. Deshmukh R.K. Singh A. Singh A.K. Gaikwad K. Highly variable SSR markers suitable for rice genotyping using agarose gels Mol. Breed.20102535936410.1007/s 11032-009-9328-1 · doi ↗
5Iruela M. Rubio J. Cubero J.I. Gil J. Milan T. Phylogenetic analysis in the genus Cicer and cultivated chickpea using RAPD and ISSR markers Theor. Appl. Genet.200210464365110.1007/s 00122010075112582669 · doi ↗ · pubmed ↗
6Talebi R. Fayaz F. Mardi M. Pirsyedi S.M. Naji A.M. Genetic relationships among chickpea (Cicer arietinum) elite lines based on RAPD and agronomic markers Int. J. Agric. Biol.20081047
7Nguyen T.T. Taylor P.W.J. Redden R.J. Ford R. Genetic diversity estimates in Cicer using AFLP analysis Plant Breed.200412317317910.1046/j.1439-0523.2003.00942.x · doi ↗
8Shan F. Clarke H.J. Yan G. Plummer J.A. Siddique K.H.M. Identification of duplicates and fingerprinting of primary and secondary wild annual Cicer gene pools using AFLP markers Genet. Resour. Crop Evol.20075451952710.1007/s 10722-006-0008-2 · doi ↗