Draft genome datasets for Cimex hemipterus from 454 Roche shotgun sequencings and Illumina HiSeq
Li Lim, Abdul Hafiz Ab Majid

TL;DR
This paper presents an updated draft genome of Cimex hemipterus using Illumina sequencing, showing better assembly than previous 454 Roche data.
Contribution
The study provides a more complete and better-organized genome assembly for Cimex hemipterus using newer sequencing technology.
Findings
The Illumina HiSeq dataset produced larger data volumes than 454 Roche sequencing.
The new assembly showed improved scaffolding compared to the older 454 Roche data.
The assembled genome is publicly available in the Figshare repository.
Abstract
The draft genome data for Cimex hemipterus obtained through Illumina HiSeq sequencing were presented. The raw genomic data was deposited in GenBank under BioProject (PRJNA722579) with the BioSample accession number SAMN18780126. Software, including FLASH, SPADES, and QUAST, were used to merge, assemble, and qualify the raw dataset. The assembled genome was available in the Figshare repository. The assembled genomic data was compared to C. hemipterus data obtained using 454 Roche shotgun sequencing (BioProject, PRJNA308532), downloaded from NCBI. The draft genome data from this work demonstrated larger data volumes and an updated assembly of the C. hemipterus genome with better scaffolding compared to genome data obtained from 454 Roche shotgun sequencing.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDermatology and Skin Diseases · Allergic Rhinitis and Sensitization · Dermatological diseases and infestations
Specifications TableSubjectGeneticsSpecific subject areaGenomics and molecular biologyType of dataAssembled genome data, TableData collectionGenomic DNA was extracted from a male tropical bed bug,Cimex hemipterus. The genome of tropical bed bugs was sequenced using the Illumina systemData source locationSchool of Biological Sciences, Universiti Sains Malaysia, Gelugor, Penang, MalaysiaData accessibilityThe raw genome sequencing data were deposited in NCBI while the assembled data was deposited in Figshare. Both datasets are in FASTQ formatRepository name: NCBI, FigshareData identification number: 1. BioSample accession number, SAMN18780126 under BioProject (PRJNA722579) 2. FigshareDirect URL to data: 1. https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA7225792. 10.6084/m9.figshare.16815364Related research articleNone*.*
Value of the Data
1
- •This genomic sequence data may help to clarify the molecular details of C. hemipterus and related traits of this species.
- •The genome sequence data of C. hemipterus permits researching the genetic information of this species.
- •The sequence data will be useful for transcriptome and comparative genomic analyses of C. hemipterus.
- •To supplement or further enrich the C. hemipterus genome sequence data that are currently available.
Background
2
Draft genome sequencing of C. hemipterus, commonly known as the tropical bed bug, is a critical step in addressing the growing concerns associated with its resurgence and impact on public health. This hematophagous insect has been linked to severe allergic reactions, psychological distress, and the potential transmission of pathogens [1]. Unlike its counterpart, Cimex lectularius, the tropical bed bug thrives in warmer climates and has exhibited significant resistance to common insecticides, complicating control measures [2,3]. A comprehensive understanding of its genetic makeup is essential to combat the increasing infestations and develop targeted interventions.
Despite the availability of existing datasets generated using 454 Roche shotgun sequencing, the new dataset is needed as the 454 sequencing technology, though pioneering in its time, has inherent limitations, such as lower throughput, shorter read lengths compared to newer technologies, and higher error rates, particularly in homopolymeric regions [4]. In contrast, Illumina sequencing technology offers significant advancements that can enhance the quality and utility of the genomic data for C. hemipterus.
Data Description
3
Using an Illumina HiSeq platform, the assembled genome produced a size of 388.66 Mb. The data from the present study was compared with the draft genome data of C. hemipterus provided by Seri Masran & Ab Majid, which was downloaded from the NCBI BioProject (PRJNA308532) [5]. In Seri Masran & Ab Majidʼs dataset, the assembled genome is 2.7 Mb [5]. The features of both genome datasets are summarized in Table 1.Table 1. Statistics of assembled sequences of C. hemipterusMeasureValuePresent studySeri Masran & Ab Majid (2016)SystemIllumina HiSeq sequencing454 Roche shotgun sequencingSize388.66 Mb2.7 MbLargest contig (bp)49,408575%GC35.3635N508259519N75216557
Experimental Design, Materials and Methods
4
The genomic DNA of C. hemipterus was extracted using a HiYield Genomic DNA isolation kit (Real Biotech Corporation, Taiwan) by following the manufacturer's instructions. Before library preparation, gDNA was sheared with Covaris M220 (Covaris, Inc.) to a mean fragment size of around 300 bp. The library was then constructed according to the manufacturer's protocol using the TruSeq™ DNA Sample Prep Kit and cBot Truseq PE Cluster Kit v3-cBot-HS. The draft genome sequence data of C. hemipterus was sequenced using the Illumina HiSeq platform (Illumina, San Diego, USA). Generated reads were trimmed using Trimmomatic [6]. The quality of trimmed and filtered paired-end reads was assessed through FastQC [7]. The paired end FASTQ reads were then merged using FLASH (Fast Length Adjustment of SHort reads) [8]. The merged reads were used to identify heterozygosity and genome size estimation through Jellyfish v2.2.10 [9], GenomeScopev1.0.0 [10], and k-mer analysis with the –m 21 option. The reads were then assembled de novo using SPADES genome assembler [11]. The quality of assembled genome was evaluated using QUAST [12].
Limitations
Not applicable.
Ethics Statement
Approval from the Human Ethics Commit-tee at Universiti Sains Malaysia (USM/ JEPeM/19,120,868) was obtained.
CRediT Author Statement
Lim Li: Formal analysis; Investigation; Methodology; Project administration; Validation; Writing – original draft; Writing – review & editing. Abdul Hafiz Ab Majid: Conceptualization; Funding acquisition; Methodology; Resources; Supervision; Validation; Writing – review & editing.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Doggett S.L.Dwyer D.E.Peñas P.F.Russell R.C.Bed bugs: clinical relevance and control options Clin. Microbiol. Rev.25120121641922223237510.1128/CMR.05015-11PMC 3255965 · doi ↗ · pubmed ↗
- 2Karunaratne S.H.P.P.Damayanthi B.T.Fareena M.H.J.Imbuldeniya V.Hemingway J.Insecticide resistance in the tropical bedbug Cimex hemipterus Pestic Biochem. Physiol.8812007102107
- 3Baraka G.T.Nyundo B.A.Thomas A.Mwang'onde B.J.Kweka E.J.Susceptibility status of bedbugs (Hemiptera: cimicidae) against pyrethroid and organophosphate insecticides in Dar es Salaam, Tanzania J. Med. Entomol.57220205245283160248210.1093/jme/tjz 173 · doi ↗ · pubmed ↗
- 4Gilles A.Meglécz E.Pech N.Ferreira S.Malausa T.Martin J.F.Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing BMC Genomics 12201111110.1186/1471-2164-12-245PMC 311650621592414 · doi ↗ · pubmed ↗
- 5Seri Masran SNA Ab Majid AH Isolation and characterization of novel polymorphic microsatellite markers for Cimex hemipterus F. (Hemiptera: Cimicidae)J. Med. Entomol.55320187607652944424010.1093/jme/tjy 008 · doi ↗ · pubmed ↗
- 6Bolger A.M.Lohse M.Usadel B.Trimmomatic: a flexible trimmer for Illumina sequence data Bioinformatics 30152014211421202469540410.1093/bioinformatics/btu 170PMC 4103590 · doi ↗ · pubmed ↗
- 7MagočT.Salzberg S.L.FLASH: fast length adjustment of short reads to improve genome assemblies Bioinformatics 27212011295729632190362910.1093/bioinformatics/btr 507PMC 3198573 · doi ↗ · pubmed ↗
- 8Marçais G.Kingsford C.A fast, lock-free approach for efficient parallel counting of occurrences of k-mers Bioinformatics 27620117647702121712210.1093/bioinformatics/btr 011PMC 3051319 · doi ↗ · pubmed ↗
