Exploring ddRAD sequencing data of tomato genotypes evaluated for the heat stress tolerance
Salvatore Graci, Amalia Barone

TL;DR
This study uses ddRAD sequencing to analyze 27 tomato genotypes for heat stress tolerance, providing genomic data to support breeding for climate resilience.
Contribution
The study provides a genomic resource for identifying candidate genes and variants in tomato genotypes under heat stress.
Findings
ddRAD sequencing was used to generate genomic data from 27 tomato genotypes evaluated for heat stress tolerance.
A VCF file was created to report polymorphisms among the genotypes, supporting future breeding and research efforts.
The raw sequencing data is publicly available in the SRA database for further analysis.
Abstract
Climate change is a major concern for agricultural crops, and the selection of tolerant genotypes in response to abiotic stresses represents an important breeding strategy to reduce yield losses. In addition, the continuous development of new and more accurate high-throughput technologies for the analysis of DNA sequences is the key to improve biological understanding and application of biological knowledge. In the present work, 27 tomato genotypes already evaluated for their response under high temperature conditions were sequenced by using the ddRAD sequencing technology. The main goal was to provide genomic data useful for identifying candidate genes and variants to cope with current climate changes. Total genomic DNA was extracted from leaves and sequenced on the HiSeq2500 Illumina instrument. Raw reads of the dataset were processed using different bioinformatics tools to generate a…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Mapping and Diversity in Plants and Animals · Plant Reproductive Biology · Cocoa and Sweet Potato Agronomy
Specifications TableSubjectPlant ScienceSpecific subject areaInvestigation of ddRAD sequencing genomic data from tomato genotypes to identify suitable polymorphisms in response to abiotic stresses.Type of dataRaw fastq filesTableVariant Calling Format (VCF) fIleFigureData collectionThe genomic data of the 27 tomato genotypes were obtained using ddRAD sequencing technology. Total genomic DNA was extracted from 100 mg of young leaf tissue from all the genotypes using the DNeasy plant mini kit (Qiagen, Hilden, Germany). For DNA sequencing, samples were used to prepare libraries for the ddRAD sequencing, as described in Peterson et al. 2012, PLoS ONE 7, e37135 with minor modifications. MboI and SphI enzymes were used for restriction digestion and fragments sequenced with the V4 chemistry paired end 125 bp mode on the HiSeq2500 Illumina instrument.Data source locationDepartment of Agricultural Sciences, University of Naples Federico II, 80055 Portici (NA), ItalyData accessibilityRepository name: Sequence Read Archive (SRA-NCBI)Data identification number: BioProject: PRJNA1137563Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/PRJNA1137563Instructions for accessing these data: The raw sequencing reads can be accessed and downloaded by visiting the direct URL.Related research articleFrancesca, S., Vitale, L., Graci, S., Addonizio, M., Barone, A., & Rigano, M. M. (2024). Integrated physiological and genetic data reveal key-traits for heat tolerance in tomato. Plant Stress, 100555.
Value of the Data
1
- •The ddRAD sequencing data can be applied to perform genomic selection, gene/QTL mapping, linkage mapping, physical mapping, association mapping, genome-wide association studies (GWAS), identification of candidate genes, phylogenetic analyses and marker-assisted selection (MAS) for the investigation of tomato tolerance to abiotic stresses.
- •These data represent a useful resource for researchers and breeders to investigate inheritance and genetics of complex traits, and identification of impactful variants for their application for marker-assisted breeding.
- •The genomic data of the 27 tomato genotypes can be combined to predict heterozygous loci resulting in hybrids obtained by using these material as parental lines in breeding programs.
Background
2
The meteoric increase in sequencing technologies with next generation sequencing has dramatically changed the understanding of plant genomic variability. These tools have been widely applied to investigate the molecular response mechanisms to abiotic stresses of plants due to climate changes. Among these, heat stress can have adverse effects on plant morphology, physiology, and biochemistry during all stages of vegetative and reproductive development, thus comporting yield losses [1,2]. In this context, the double-digest Restriction enzyme-Associated DNA (ddRAD) sequencing based on the Reduced Representation Sequencing (RRS) technology provides an economic and feasible approach to obtain a portion of the whole sequence of large and repetitive plant genomes by employing two restriction enzymes with low-frequency and high-frequency cutter to digest DNA [[3], [4], [5], [6]]. ddRAD data can be used to perform genomic selection, QTL mapping, genome-wide association studies (GWAS) and identification of candidate genes [7]. This high-throughput platform represents a valuable tool for researchers and breeders in the investigation of molecular response mechanisms and selection of tolerant genotypes to cope with losses caused by abiotic stresses. The ddRAD sequencing platform was used to obtain genomics sequences of 27 tomato genotype evaluated for their response to high temperatures.
Data Description
3
The raw fastq reads of all the 27 tomato genotypes of the present study resulted from ddRAD sequencing were deposited into the Sequence Read Archive (SRA) database with the PRJNA1137563 BioProject ID (https://www.ncbi.nlm.nih.gov/sra/PRJNA1137563). Results of the reads processing are listed in Table 1.Table 1. Results of the reads processing of each genotype. The number of reads resulted from sequencing, number of surviving reads obtained by the trimming step and the overall alignment rate of the mapping procedure are reported.Table 1. GenotypeReads (n)Trimmomatic - Surviving reads (n)Bowtie2 – Overall alignment rate (%)E71,232,5651,131,10797.60E81,261,3651,181,58497.67E171,085,5501,013,98197.42E201,115,8401,049,03096.70E23329,975304,10396.61E301,330,5151,236,82097.22E361,892,7221,758,52797.09E371,027,475950,81497.84E401,332,1211,253,46697.28E411,548,0501,440,91897.47E421,140,7541,062,49595.92E431,854,0511,734,64097.49E45844,507798,40997.31E48865,036806,71396.98E531,011,532941,58197.81E552,009,4241,861,93096.66E751,341,7561,238,44396.38E76995,709930,96897.29E1071,078,3731,010,26597.60E2011,810,6541,696,54697.62PDVIT1,791,9811,674,57695.27M821,771,5441,666,97896.91MoneyMaker1,531,4771,419,47097.33LA2662981,946918,23197.53LA31201,604,2921,488,25897.14DOCET912,522855,12297.24JAG88101,024,871956,10295.61
The Variant Calling Format (VCF) file is reported in Table S1 and evidenced a final dataset of 79,951 variants compared to the Heinz S. lycopersicum reference genome (SL4.0 version). Moreover, the number of homozygous and heterozygous variants of each genotype compared to the reference genome was also investigated and results are reported in Fig. 1.Fig. 1. Graphical representation showing the number of homozygous reference-, homozygous alternative- and heterozygous alleles compared to Heinz S. lycopersicum reference genome of each genotype sequenced by ddRAD sequencing technology.Fig 1
Considering the homozygous variants respect to the Heinz reference genome, their distribution along the tomato genome was also assessed for all the twenty-seven genotypes focusing on each of the twelve chromosomes (Table 2).Table 2. Distribution of the homozygous variants respect to the Heinz reference genome on each of the twelve tomato chromosomes.Table 2. GenotypeVariants (n)Ch01Ch02Ch03Ch04Ch05Ch06Ch07Ch08Ch09Ch10Ch11Ch12E72,0664496901,5031,7959211,1352,3721,9446929691,277E82,0764477091,5221,7858811,1322,3701,8516029762,171E172,0664187261,5491,9461,0081,1362,3581,8607119761,262E202,0983976941,4641,7739171,1362,3521,9406407931,148E232,0414527021,5351,7839731,1242,3631,9036238152,093E302,1304197431,5761,7549191,1492,3661,9556237981,209E362,0614546811,5621,7689101,1292,3901,9376329712,224E372,0544547081,5661,7759071,1402,3471,9236149412,170E402,0744337381,5761,7809301,1482,3461,9736309821,205E412,0654176881,5371,7709071,1382,3791,9377399622,165E424,5165346954,0691,8378674,2892,3831,8357621,0693,372E432,0944426791,4711,7739161,1252,3761,9456249541,294E451,6964237201,5441,7839221,1312,3451,9676287772,204E482,0474456961,5421,7779081,1342,3611,9246189332,134E532,0924617231,5401,8309151,1282,3851,8616119942,270E552,0775058474,0221,7781,4191,3422,4132,6088338192,266E752,0206221,1873,5874,6351,0611,1642,4531,9377769581,624E762,1004547381,5821,7809161,1532,3611,9476077702,291E1072,0744527371,5721,7679951,1322,3961,8621,2839752,292E2011,7434396981,5801,8598561,0912,3381,9157391,0571,647PDVIT2,2657418044,6475,9331,1211,2402,3733,8716292,7223,421M821,6314917154,0755,5169201,1472,3311,8467482,5771,651MoneyMaker2,0874537151,5291,7888951,1392,3841,8436111,0062,240LA26621,6224126701,4501,7908581,0942,3511,8637311,0782,217LA31202,0874367101,8201,8209281,0942,3991,8736309922,229DOCET1,5873856811,4881,9808721,1342,3101,8306847622,287JAG88101,5914096371,6291,9589581,1052,3151,8117541,1352,279
Experimental Design, Materials and Methods
4
A Variant Calling Format (VCF) file including sequences data retrieved by using ddRAD technology on 27 tomato genotypes [[8], [9], [10], [11]] was investigated. The 27 tomato lines were selected from a wide tomato germplasm collection available at the Department of Agricultural Sciences of the University of Naples Federico II, which includes genotypes showing high variability in terms of fruit shape and quality, plant habitus and tolerance to biotic and abiotic stresses [12,13]. A detailed list of the genotyped material is reported in Table 3.Table 3. List of the genotypes sequenced by using ddRAD sequencing technology. The code, the common name and the origin of the genotypes are reported.Table 3. CodeCommon nameOriginE7Corbarino PC04ItalyE8Corbarino PC05ItalyE17Pantano RomanescoItalyE20PizzutelloItalyE23SanMarzano 1-38 SMECItalyE30Sel PC07ItalyE36Vesuvio Foglia RicciaItalyE37Siccagno del VesuvioItalyE40GiaGiùItalyE41ParmitanellaItalyE42PI15250ItalyE43Principe BorgheseItalyE45SM246ItalyE48Vesuvio2001ItalyE53LA0147HondurasE55LA0358ColombiaE75Gold NuggetUSAE76Black PlumUSAE107E-L-19SpainE201--PDVITCannellino VitielloItalyM82M82USAMoney MakerMoney MakerUSALA2662Saladette-LA3120Malintka 101-DOCET F_1_DOCET F_1_SeminisJAG8810 F_1_JAG8810 F_1_Seminis
Raw FASTQ files were quality-filtered and trimmed using Trimmomatic [14] v.0.39 (http://www.usadellab.org/cms/?page=trimmomatic) with default parameters. Paired trimmed reads were aligned with the Solanum lycopersicum reference genome (Tomato Genome version SL4.0, available at the Solgenomics Network, www.solgenomics.net) using Bowtie-2 [15] with default parameters. The resulting BAM files were sorted, de-duplicated and indexed with Samtools [16]. Finally, the variant calling step was performed by using BCFtools mpileup [17] with default parameters. Only the positions showing no missing data for all the 27 genotypes were considered to generate the final VCF file. Finally, it was filtered by using VCFtools [18] setting the parameters as follows: minQ = 10 and minimum mean of depth of coverage (min–mean DP) = 10.
Limitations
The ddRAD sequencing analyses provide only a random reduced representation of the whole genome variability. Some polymorphisms in genes of interest may not be detected by the sequencing technology.
Ethics Statement
The authors have read and follow the ethical requirements for publication in Data in Brief and confirm that the current work does not involve human subjects, animal experiments, or any data collected from social media platforms.
Credit Author Statement
Graci Salvatore: Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Visualization. Barone Amalia: Conceptualization, Writing – review & editing, Funding acquisition.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Alsamir M.Mahmood T.Trethowan R.Ahmad N.An overview of heat stress in tomato (Solanum lycopersicum L.)Saudi J. Biol. Sci.2832021165416633373205110.1016/j.sjbs.2020.11.088PMC 7938145 · doi ↗ · pubmed ↗
- 2Graci S.Barone A.Tomato plant response to heat stress: a focus on candidate genes for yield-related traits Front. Plant Sci.142024124566110.3389/fpls.2023.1245661 PMC 1080040538259925 · doi ↗ · pubmed ↗
- 3Baird N.A.Etter P.D.Atwood T.S.Currey M.C.Shiver A.L.Lewis Z.A.Selker E.U.Cresko W.A.Johnson E.A.Rapid SNP discovery and genetic mapping using sequenced RAD markers P Lo S One 3102008 e 33761885287810.1371/journal.pone.0003376 PMC 2557064 · doi ↗ · pubmed ↗
- 4Miller M.R.Dunham J.P.Amores A.Cresko W.A.Johnson E.A.Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers Genome Res.17220072402481718937810.1101/gr.5681207 PMC 1781356 · doi ↗ · pubmed ↗
- 5Hamilton J.P.Buell C.Robin Advances in plant genome sequencing Plant J.70120121771902244905110.1111/j.1365-313X.2012.04894.x · doi ↗ · pubmed ↗
- 6Scheben A.Batley J.Edwards D.Genotyping-by-sequencing approaches to characterize crop genomes: choosing the right tool for the right application Plant Biotechnol. J.15220171491612769661910.1111/pbi.12645 PMC 5258866 · doi ↗ · pubmed ↗
- 7Chung Y.S.Choi S.C.Jun T.-H.Kim C.Genotyping-by-sequencing: a promising tool for plant genetics research and breeding Hortic Environ Biotechnol 582017425431
- 8Francesca S.Vitale L.Arena C.Raimondi G.Olivieri F.Cirillo V.Paradiso A.de Pinto M.C.Maggio A.Barone A.Rigano M.M.The efficient physiological strategy of a novel tomato genotype to adapt to chronic combined water and heat stress Plant Biol 241202262743460559410.1111/plb.13339 PMC 9293464 · doi ↗ · pubmed ↗
