A Compact Closed Genome of Orientia tsutsugamushi from Hainan Island, China Provides a TA763_A Reference and Reveals Repeat-Driven Remodeling
Yi Niu, Yijia Guo, Zhao Xu, Siqi Chen, Liyuan Zhang, Xiuji Cui, Dachuan Lin, Kwok-Yung Yuen, Jasper Fuk-Woo Chan, Chuanning Tang, Feifei Yin

TL;DR
This paper presents a complete genome of a scrub typhus-causing bacterium from China and reveals how repetitive DNA drives genome changes.
Contribution
The study provides a closed genome of a TA763_A lineage strain from a high-incidence region and reveals repeat-driven genome remodeling.
Findings
HMU_001 has the most compact genome among 17 O. tsutsugamushi strains analyzed.
Repeats make up nearly half of the genome and largely explain genome size differences.
Genome synteny is limited across strains but higher among close relatives, indicating ongoing rearrangement.
Abstract
Scrub typhus, caused by the obligate intracellular bacterium Orientia tsutsugamushi (O. tsutsugamushi), remains a major public-health concern in the Asia–Pacific region. Genome-wide inference is complicated by extensive repetitive DNA and frequent genome rearrangement. We isolated O. tsutsugamushi HMU_001 from a scrub typhus patient on Hainan Island, China. Intracellular morphology was examined and replication was quantified in endothelial cells. Using long-read sequencing with short-read polishing, we generated a closed circular genome and performed standardized comparative analyses across all available complete O. tsutsugamushi genomes. HMU_001 assembled as a 1,895,724 bp genome and, among the 17 complete genomes analyzed in this study, represented the most compact genome. Repeats comprised 873,550 bp (46.08%) and included 72 RAGE loci (4 relatively complete) and 283 insertion…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —National Natural Science Foundation of China
- —Hainan Provincial International Science and Technology Cooperation Research and Development Project
- —Hainan Provincial Natural Science Foundation of China
- —Major Project of Guangzhou National Laboratory
- —Graduate Student Innovation Grant of Hainan Medical University
- —Lee Wan Keung Charity Foundation Limited
- —Marina Man-Wai Lee, and the Hong Kong Hainan Commercial Association South China Microbiology Research Fund
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVector-borne infectious diseases · Genomics and Phylogenetic Studies · Plant Pathogenic Bacteria Studies
1. Introduction
Orientia tsutsugamushi (O. tsutsugamushi) is an obligate intracellular α-proteobacterium in the family Rickettsiaceae and the causative agent of scrub typhus. Humans acquire infection primarily through the bite of infected larval trombiculid mites (chiggers) [1]. Scrub typhus has historically been considered endemic within the “tsutsugamushi triangle” in the Asia–Pacific region, but its recognized distribution has broadened over recent decades [2,3]. In addition to re-emergence in long-established endemic settings, sporadic cases and outbreaks have been reported outside this classic range [4]. Moreover, genetically divergent Orientia lineages, including Candidatus Orientia chuto and Candidatus Orientia chiloensis, have been described in the Middle East and South America, respectively. Emerging evidence indicates wider diversity within the genus and underscores ongoing questions about the evolution and global risk of Orientia sp. [5,6].
Clinically, scrub typhus typically presents as an acute febrile illness with nonspecific manifestations, such as headache, myalgia, cough, and gastrointestinal symptoms, rash, and lymphadenopathy [1]. Eschar at the site of the chigger bite is pathognomonic but is absent in a substantial fraction of patients [7]. Delayed diagnosis and treatment may lead to severe complications such as pneumonitis or acute respiratory distress syndrome, meningoencephalitis, acute kidney injury, disseminated intravascular coagulation, and myocarditis [8,9]. Laboratory confirmation relies primarily on serologic assays, such as indirect immunofluorescence assay, enzyme-linked immunosorbent assay (ELISA), and polymerase chain reaction (PCR) targeting O. tsutsugamushi genes such as 56-kDa type-specific antigen (tsa56), tsa47, 16S rRNA, and groEL [10,11,12]. Although doxycycline and related agents are generally effective, the clinical burden remains substantial in many endemic settings, particularly where access to timely diagnostic and treatment is limited [13,14].
A long-standing challenge in scrub typhus research is the extraordinary antigenic and genetic diversity of O. tsutsugamushi. Early strain classification relied on serologic approaches using prototype strains (Gilliam, Karp, Kato), while molecular typing of the immunodominant tsa56 gene is widely used for strain assignment in clinical and vector samples [15,16,17,18]. However, accumulating evidence indicates that tsa56-based classification only partially reflects genome-scale relationships, as virulence and phenotypic variation likely arise from distributed strain-specific determinants rather than a single locus [19,20,21,22]. Multilocus sequence typing (MLST), or concatenation of TSA56 with selected autotransporter loci such as ScaA, can improve phylogenetic resolution but remains influenced by homologous recombination and does not fully capture genome-wide architecture [20,23,24,25]. These limitations highlight the importance of genome-scale analyses for resolving evolutionary relationships in O. tsutsugamushi.
Whole-genome sequencing provides the most comprehensive framework for understanding O. tsutsugamushi diversity and for linking strain variation to phenotypes relevant to transmission, immune recognition, and disease. However, complete-genome sequencing of O. tsutsugamushi remains technically demanding because its chromosome is unusually large for an obligate intracellular bacterium (approximately 1.9–2.5 Mb in published complete genomes) and is densely populated by repetitive, mobile element–derived DNA [22,26,27]. These repeats are dominated by Rickettsiales amplified genetic elements (RAGEs) that resemble integrative and conjugative element systems, and by insertion sequences (IS elements) [27,28]. Repeat proliferation is accompanied by extensive pseudogenization and fragmentation, and comparative analyses have shown profound disruption of gene order among strains, consistent with ongoing repeat-mediated rearrangement [22]. Long-read sequencing has improved the ability to span repeats and generate closed assemblies [22,29], but the number of publicly available complete genomes remains limited, constraining robust phylogeographic inference and comprehensive comparisons across endemic regions and tsa56 genotypes. Consequently, because complete genomes remain sparse and unevenly distributed across genotypes and regions, it remains unclear whether repeat burden and genome architecture vary systematically among lineages or geographic settings.
Hainan Island in southern China is an endemic setting for scrub typhus, with a mean annual reported incidence of 2.75 per 100,000 population during 2008–2023 and an overall increasing trend [30]. Recent clinical surveillance further detected O. tsutsugamushi in 14.51% (73/503) of patients with undifferentiated febrile illness in Hainan during 2018–2021 [31]. Despite its public-health importance, closed genomic references from this region are lacking. In this study, we isolated O. tsutsugamushi strain HMU_001 directly from a laboratory-confirmed scrub typhus case in Hainan from which a viable isolate was successfully recovered and propagated, enabling intracellular characterization and whole-genome sequencing. The case was also clinically well characterized, allowing interpretation of the genome in a defined clinical context. In addition, the strain belonged to the TA763_A tsa56 lineage, for which no complete O. tsutsugamushi genome was available in the comparative dataset analyzed in this study. Using long-read sequencing complemented by short-read polishing, we generated a complete circular genome and performed standardized comparative analyses against all publicly available complete O. tsutsugamushi genomes. Our main objectives were to (i) provide a closed genome reference representing a contemporary clinical isolate from a tropical island setting, (ii) quantitatively describe variation in genome composition, repeat burden, gene duplication, and gene degeneration across the species, and (iii) place HMU_001 within a core-genome phylogeny while evaluating patterns of genome synteny and rearrangement.
2. Materials and Methods
2.1. Ethics Statement
This study was approved by the Ethics Committee of Hainan Medical University (Approval No. HYLL-2020-061). Written informed consent was obtained from the patient for participation and for publication of de-identified information.
2.2. Case Identification and Laboratory Confirmation
A laboratory-confirmed case of scrub typhus from Hainan Island, China identified in 2023 was included in this study. Whole blood and serum samples were collected at admission for diagnostic testing. Serum IgM and IgG antibodies against O. tsutsugamushi were screened using a colloidal gold immunochromatographic assay (GICA; Wantai, Beijing, China) targeting TSA56. In parallel, DNA extracted from whole blood was tested using a multiplex real-time PCR targeting the tsa56, tsa47, and groEL genes (Zybio, Chongqing, China), following the manufacturer′s instructions. For tsa56 lineage assignment, a partial fragment was amplified by nested PCR [32] and its sequence was obtained by Sanger sequencing. Genotypes were assigned based on sequence similarity and phylogenetic placement. Clinical and laboratory information was retrieved from the patient’s medical record.
2.3. Isolation and Culture of O. tsutsugamushi from Blood Clot
Approximately 1 mL of blood clot was mechanically disrupted with 3 mm zirconia beads (4000 rpm for 15 s, three cycles). The suspension was inoculated onto 80–90% confluent L-929 monolayers in 25 cm^2^ flasks. Inoculated cells were maintained at 35 °C with 5% CO_2_ in Minimum Essential Medium (MEM; Gibco, Grand Island, NY, USA) supplemented with 2% horse serum (Gibco, Grand Island, NY, USA) and non-essential amino acids (NEAA; Gibco, Grand Island, NY, USA). Cultures were examined daily by light microscopy, and the culture medium was replaced at 48 h post-inoculation and subsequently as needed. Successful infection was assessed by the development of cytopathic effects (CPE) and Giemsa staining following standard procedures. When CPE involved ≥90% of the monolayer (as estimated by microscopy), infected cells were passaged at a 1:2 split ratio by scraping and transferring the cell suspension onto fresh L-929 monolayers.
2.4. Immuno-Electron Microscopy
L-929 cells were infected with O. tsutsugamushi at an input ratio equivalent to a multiplicity of infection (MOI) of 100 bacterial genome copies per cell, calculated from the genome DNA load and the host–cell count at inoculation. At 3 days post infection (dpi), cells were fixed in 2.5% glutaraldehyde, embedded in resin, sectioned into 70–80 nm slices and mounted on nickel grids. Sections were incubated with rabbit anti-TSA56 polyclonal antibody (Sangon Biotech, Shanghai, China; 1:5000), followed by incubation with 10 nm colloidal gold-conjugated goat anti-rabbit IgG secondary antibody (Sigma-Aldrich; St. Louis, MO, USA, 1:2000). Labeled sections were examined using a Hitachi HT7800 transmission electron microscope (Hitachi High-Tech, Tokyo, Japan).
2.5. Growth Kinetics by qPCR
Human umbilical vein endothelial cells (HUVECs) were seeded in 12-well plates at 1 × 10^5^ cells per well and incubated overnight to allow adherence. O. tsutsugamushi was purified from infected L-929 cells as described previously [33]. Briefly, infected L-929 cells were disrupted by 1 mm zirconia beads (3000 rpm for 15 s, three cycles) and then centrifuged at 50× g for 5 min to remove cell debris. The supernatant was subsequently centrifuged at 20,000× g for 10 min to pellet bacteria. The bacterial pellet was resuspended in new culture medium and used to infect HUVEC monolayers at an MOI of 100. Infected HUVECs were maintained at 35 °C with 5% CO_2_ in Dulbecco’s Modified Eagle Medium (DMEM; Gibco, USA) supplemented with 2% fetal bovine serum (Gibco, USA). Cells were harvested at 4 h post infection and then daily from 1 to 9 dpi. Bacterial growth curves were generated by plotting the genome copy number per well over time. For quantifying the bacteria, total genomic DNA was extracted from each well using the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) and quantified by real-time qPCR targeting tsa56 using the primers: 5′-GGCCAAGTTAAACTCTATGCTGA-3′ (forward) and 5′-CATTAATTGCTACACCAAGTGC-3′ (reverse), yielding an approximately 163 bp amplicon. Reactions were performed with TB Green^®^ Premix Ex Taq™ II (Takara, Kyoto, Japan) on an BIO-RAD CFX96 Real-Time PCR System (Bio-Rad, Hercules, CA, USA) under the following cycling conditions: 95 °C for 30 s; 40 cycles of 95 °C for 5 s and 60 °C for 30 s; followed by melt-curve analysis. Bacteria genome copy numbers were calculated from a plasmid DNA standard curve (10^1^–10^8^ copies/reaction). The experiment was performed independently three times. Bacterial doubling time (d.t.) was calculated as d.t. = t × 0.301/log_10_ (N_2_/N_1_), where t represented the elapsed time and N_2_/N_1_ was the fold change in bacterial genome copies between two time points. Bacteria load at each time point and d.t_min_ were represented by mean ± SD.
2.6. Genomic DNA Preparation and Whole-Genome Sequencing
A total of 1 × 10^7^ L-929 cells were infected with O. tsutsugamushi at an MOI of 100 and incubated at 35 °C with 5% CO_2_. At 5 dpi, cells were harvested and mechanically disrupted with 1 mm zirconia beads (3000 rpm for 15 s, three cycles). The lysate was centrifuged at 50× g for 10 min to remove host–cell debris, and bacteria in the supernatant were further pelleted at 20,000× g for 30 min. The pellet was washed three times with phosphate-buffered saline (PBS) and genomic DNA was extracted using the QIAamp DNA Mini Kit. High molecular weight DNA (≥50 kb) was used for long-read sequencing. A SMRTbell library (Pacific Biosciences, Menlo Park, CA, USA) was prepared and genomic DNA was sheared to an average size of approximately 15 kb, followed by end-repair and SMRTbell adapter ligation. Fragments ≥ 15 kb were size-selected on a BluePippin instrument (Sage Science, Beverly, MA, USA) and sequenced on a PacBio Revio Single-Molecule Real-Time (SMRT) Sequencing system (Pacific Biosciences, Menlo Park, CA, USA). Circular consensus sequencing (CCS) was performed using PacBio ccs software to generate high-fidelity (HiFi) reads (long reads). In addition, a short-read paired-end library (input DNA ≥ 0.2 µg) was prepared using the Watchmaker DNA Library Prep Kit (7K0019-096; MGI Tech, Shenzhen, China) and sequenced on an MGI DNBSEQ-T7 Sequencing System (MGI Tech, Shenzhen, China) to generate 150-bp paired-end reads (short reads), following the manufacturers’ instructions.
2.7. Genome Assembly, Annotation and Comparative Analysis
PacBio HiFi reads were filtered for host contamination by mapping to the mouse reference genome (GCF_000001635.27) using minimap2 v2.29, and unmapped reads were de novo assembled with hifiasm v0.12 (default parameters) [34]. The circular chromosome was finalized with Circlator v1.5.5 [35], including rotation to start at dnaA. Paired-end short reads were trimmed with Trimmomatic v0.33 and screened against the mouse genome with Bowtie2 v2.2.4. Non-host short reads were mapped to the HiFi assembly with BWA-MEM2 v2.2.1, and three iterative rounds of polishing were performed using Pilon v1.22 to correct residual base-level errors and improve consensus accuracy of the assembly [36].
Assembly quality was further evaluated by examining sequencing depth, consensus accuracy, and genome circularization. After host-read removal, effective sequencing depth at each genome position was calculated using samtools depth. Sequencing depth distribution histogram and genome-wide sequencing depth profile plot were examined to assess coverage distribution and mapping consistency. Regions showing reduced depth were inspected to evaluate potential mapping bias associated with repetitive genomic regions. Base-level alignment statistics, including mismatch and indel rates, were calculated from samtools stats reports. To validate circularization, a junction-containing reference was generated by appending the terminal 20 kb of the assembly to the genome start, and both long-read and short-read datasets were remapped to quantify reads spanning the circularization junction.
Protein-coding sequences (CDSs) were predicted and annotated with Prokka v1.14.6 [37] using a custom database derived from NCBI PGAP annotations for previously sequenced O. tsutsugamushi genomes. rRNA and tRNA predictions were refined using Infernal v1.1.5 [38] and tRNAscan-SE v2.0.9 [39], respectively. Pseudogenes were predicted with Pseudofinder v1.1.0 [40] under the following criteria: (i) CDSs were searched against the NCBI nr database, and CDSs shorter than 90% of the mean length of their top 150 BLAST homologs were classified as truncated genes. This threshold was selected empirically during method optimization by comparing the effects of several alternative parameter settings and was used as a practical screening cutoff in our dataset. Our pseudogene analysis was intended as a conservative comparative screen for likely truncation- and fragmentation-associated pseudogenization rather than optimization of a locus-specific functional cutoff. The use of mean homolog length followed the native implementation of the workflow employed here. Comparison with the mean length of the top 150 BLAST homologs was used to provide a stable reference length distribution while reducing the influence of annotation artifacts or atypical individual matches. (ii) Adjacent CDSs sharing 30% of their hits, or CDSs sharing hits with intergenic regions, were classified as split genes. Because degraded ancestral genes in O. tsutsugamushi are frequently represented as short CDSs or multiple adjacent fragments, original disruptive mutations such as internal stop codons or frameshifts would not be reliably recovered; therefore, we relied primarily on homology-based evidence of truncation and fragmentation. This approach was intended to distinguish likely truncation events from natural homolog length variation by combining a conservative length threshold with comparison against a broad homolog set. CDSs longer than their homolog length distribution and pseudogene candidates located entirely within intergenic regions were not included. COG functional categories were assigned with eggNOG-mapper v2.1.12 [41]. Sixteen publicly available complete O. tsutsugamushi genomes (Table 1) were re-annotated using the same pipeline; genomes not starting at dnaA were rotated using Circlator (fixstart).
Orthogroups were inferred with OrthoFinder v3.0.1b1 [42], and single-copy core genes were defined as genes in orthogroups presented in all 17 genomes with exactly one ortholog per genome. Concatenated amino acid sequences of single-copy orthologs aligned with MAFFT v7.525 were used to infer a maximum-likelihood phylogeny in RAxML v8.2.12 [43] (PROTGAMMALG, 1000 bootstraps; visualized in FigTree v1.4.4). For comparison, partial tsa56 nucleotide sequences (approximately 483 bp) were aligned with ClustalW v2.1 and analyzed by RAxML (GTRGAMMA, 1000 bootstraps), rooted on the Boryong strain. MLST sequence types (STs) of the 17 strains analyzed in this study were inferred from nucleotide sequences of seven housekeeping genes (gpsA, mdh, nrdB, nuoF, ppdK, sucB, and sucD) extracted from genome data according to the allele profiles defined in the PubMLST O. tsutsugamushi database (https://pubmlst.org/organisms/orientia-tsutsugamushi/ (accessed on 10 March 2026)) [44].
Repeat regions were identified by BLASTn v2.16.0+ self-alignment of the chromosome (identity ≥ 90%, alignment length ≥ 50 bp, E < 1 × 10^−5^). Self-hit alignments were removed, and overlapping coordinates were merged using BEDTools v2.31.1 to define nonredundant repeat intervals. Repeat content was calculated as the total base number of merged intervals. Repeat genes were defined as genes with at least one genomic homolog detected by BLASTn (identity ≥ 90%, query coverage ≥ 80%; E-value < 1 × 10^−5^). The proportions of repeat genes in each COG categories, and the proportions of pseudogenes in repeat genes and non-repeat genes, were calculated and compared among the 17 genomes. In addition, pseudogenization rates in repetitive and non-repetitive genes were compared across the 17 genomes using a paired Wilcoxon signed-rank test; p < 0.05 was considered statistically significant.
RAGEs in the HMU_001 genome were identified based on published structural criteria and manual curation, focusing on genomic regions containing ICE-associated modules (integrase genes and conjugation-associated F-T4SS gene clusters), transposase/reverse transcriptase genes together with associated cargo genes [27,28]. Following published criteria [28], we distinguished “complete RAGE”, defined as structurally complete elements retaining the full set of intact mobilization gene categories in overall canonical organization, from “complete RAGE with truncated or split genes”, in which the full set of mobilization gene categories is present but at least one mobilization gene is truncated or split. Loci lacking key mobilization components were classified as incomplete. To further assess gene integrity within RAGEs, conserved domains were evaluated using SMART v10.0 as described previously [28], and RAGE-associated genes were categorized as intact if they contained all key functional domains. ISs in HMU_001 were detected by BLASTn searches against the ISfinder database v2.16.0 (identity ≥ 90%, E-value <1 × 10^−5^), with manual verification of boundaries where needed. A complete IS was defined as an element containing a full-length transposase gene.
Pairwise synteny across the 17 genomes was assessed using MUMmer v4.0.1 (nucmer, MUM mode with a minimum cluster length of 500 bp to reduce noise from repetitive regions) [45]. Syntenic blocks ≥ 2000 bp (identity ≥ 90%) were visualized in R v4.4.3 (ggplot2) or Easyfig v2.2.5. For each pairwise comparison, the syntenic ratio was calculated as the total length of syntenic blocks divided by the query genome length, and the number of syntenic blocks was recorded. When calculating length of syntenic blocks, adjacent syntenic blocks that were slightly overlapped were merged using BEDTools to prevent repeated calculation of bases at the junctions, which had almost no impact on estimation of syntenic ratios.
2.8. Data Availability
The complete genome sequence of O. tsutsugamushi strain HMU_001 is available in NCBI (accession: CP191261, BioProject: PRJNA1258383, BioSample: SAMN48320726). Raw reads were deposited in SRA (SRR33649153 (PacBio), SRR33649152 (MGI)).
3. Results
3.1. Clinical Presentation and Laboratory Findings
A 57-year-old female forestry worker living in southern Hainan Island was admitted to the Second Affiliated Hospital of Hainan Medical University in July 2023. The patient reported a five-day history of fever after working in tropical rainforest areas. On admission, she had fever of up to 39.5 °C and hypoxemia with SpO_2_ of 89% at room air. Physical examination revealed a 1.0 cm × 0.5 cm eschar at the left inguinal region. Chest computerized tomography scans of the thorax showed scattered bilateral pulmonary infiltrates. Laboratory tests showed leukocytosis with marked neutrophilia, elevated inflammatory markers, coagulation abnormalities, mild hepatic and renal dysfunction, and hypoalbuminemia (Table S1). Arterial blood gas testing indicated metabolic acidosis with elevated lactate (Table S1).
Serum IgM and IgG against O. tsutsugamushi were positive. Quantitative PCR was positive with an estimated bacterial load of 2.68 × 10^5^ copies/mL. Genotyping of a partial tsa56 fragment assigned the strain to the TA763_A lineage. The patient was diagnosed with scrub typhus complicated by pneumonia, hypoxemia, acid-base imbalance, and hepatic and renal dysfunction. Doxycycline (100 mg intravenously every 12 hourly) was administered and that patient improved with defervescence, improved oxygenation, and normalization of laboratory tests in the ensuing 48 h (Table S1). The patient was discharged after 5 days of hospitalization.
3.2. Bacterial Morphology and Growth Kinetics
O. tsutsugamushi was isolated from the patient’s blood clot in L-929 cell monolayers and designated as strain HMU_001. Bacterial morphology and intracellular localization were examined by Giemsa staining and immuno-electron microscopy. On Giemsa staining, infected cells showed dense perinuclear clusters of purple granular organisms, whereas uninfected controls displayed pale-purple reticular cytoplasm (Figure 1A,B). Immuno-electron microscopy revealed intracellular bacteria that were spherical to ovoid or short rods, approximately 0.3–0.5 × 0.8–1.5 µm, and predominantly clustered near the nucleus. Each organism displayed a double-membrane envelope with a prominent electron-lucent intermembrane space and a loose, heterogeneous electron-dense internal matrix. Sparse immunogold labeling targeting TSA56 was observed on the bacterial surface by electron microscopy, with an uneven distribution along the envelope (Figure 1C,D). Occasionally, membrane-enclosed organisms were observed near the cell periphery, showing a budding-like appearance; however, this finding is descriptive only and does not imply a specific replication mechanism (Figure 1C).
To quantify replication, we infected HUVECs in 12-well plates with purified bacteria at an MOI of 100 and measured bacterial genome DNA by qPCR over time. At 4 h post-infection (defined as day 0), the bacterial load was (2.29 ± 1.05) × 10^7^ genome copies per well. The bacterial burden increased modestly during 0–2 dpi, then entered a clear exponential phase between 3–6 dpi, reaching (1.09 ± 0.56) × 10^10^ copies per well. Growth slowed thereafter and plateaued between 7–9 dpi at (1.92 ± 0.69) × 10^10^ genome copies per well, representing an approximately 1000-fold increase over the initial level. The steepest increase occurred between 3 and 4 dpi, corresponding to a minimal doubling time of 9.36 ± 1.78 h (Figure 1E).
3.3. Genome Composition and Functional Overview of O. tsutsugamushi HMU_001
We conducted genome sequencing of HMU_001 using both long-read and short-read platforms. Long-read sequencing generated 1,356,103 high-fidelity reads (26.30 Gb; mean length 19,391 bp) of which 1,076,959 reads (79.42%; 21.52 Gb) remained after host reads removal. Short-read sequencing produced 164,238,400 paired-end reads (24.64 Gb); after quality filtering and host reads removal, 105,101,742 reads (63.99%; 15.71 Gb) were retained. Filtered reads mapped to the final assembly at a mean depth of 11,337× (long reads) and 8146× (short reads), with minimum depth of 7515× and 503×, respectively. Re-examination of the depth distribution showed that the relatively low short-read depth was confined to within ~100 bp of the start/end positions of the linearized circular assembly, whereas coverage across the remainder of the genome was highly uniform; 95% of genomic sites showed short-read depths between 6784× and 9522×. This discrepancy is therefore best explained by a boundary mapping effect rather than widespread uneven coverage associated with the repetitive genome structure. The mismatch rate was 0.1738% for long reads and 0.0417% for short reads, while the indel rate was 0.1403% for long reads and 0.0032% for short reads. These low alignment error rates support the high base-level accuracy of the final assembly. In addition, 10,507 long reads spanning the circularization junction provided strong support for successful genome circularization. To assess within-isolate heterogeneity and exclude multiple genotypes, we mapped the host-filtered short reads back to the finalized assembly and examined allele frequencies across genome-wide SNP sites. The allele-frequency distribution was unimodal with a dominant peak near fixation and no secondary peak, supporting a single predominant genotype for HMU_001. Consistently, Sanger sequencing chromatograms of tsa56 from serial passages showed clean single peaks at informative positions, indicating genotype stability during in vitro propagation.
The HMU_001 genome comprises a single circular chromosome of 1,895,724 bp with a GC content of 30.40%. Genome annotation predicted 2075 genes, including 2038 CDSs, 3 rRNA genes, and 34 tRNA genes (Figure 2A, Table S2). Among the 17 complete O. tsutsugamushi genomes analyzed in this study, HMU_001 had the smallest genome size and the fewest predicted genes. Using Pseudofinder, we identified 765 pseudogene candidates derived from 944 CDSs, corresponding to 46.32% of the 2038 predicted CDSs (Figure 2A). These included 587 truncated CDSs (<90% of the average length of homologs) and 178 merged pseudogenes consolidated from 357 adjacent CDSs or intergenic regions sharing >30% BLAST hits (Figure 2A, Tables S2 and S3). Summary of currently available O. tsutsugamushi strains and their genomes were shown in Table 1.
Functional classification using eggNOG assigned COG categories to 1693 CDSs (83.07%). The most represented category was “L: Replication, recombination, and repair” (464 genes), accounting for 22.77% of the total number of CDSs, followed by “S: Function unknown” (439 genes; 21.54%). Other common categories included “J: Translation, ribosomal structure and biogenesis” (127 genes; 6.23%) and “U: Intracellular trafficking, secretion, and vesicular transport” (121 genes; 5.94%). There were 345 CDSs (16.9%) that could not be placed into any COG category (Figure 2A, Table S2). Across the 17 complete genomes, all strains shared the same overall set of functional categories (Figure 2B; Table S4). Despite its smaller genome, HMU_001 showed an overall functional profile comparable to other complete genomes. Strain differences seemed driven mainly by variation in gene counts within specific categories.
3.4. Repeated Sequence Analysis
Repetitive sequences in HMU_001 totaled 873,550 bp, accounting for 46.08% of the genome. Loci of BEDTools merged repetitive sequences were shown in Figure 3A and Table S5. Across all 17 complete O. tsutsugamushi genomes, the nonrepetitive (“unique”) fraction was highly consistent at ~1.00 Mb and varied by no more than 0.04 Mb, with TA686 showing a slightly smaller unique length of 0.97 Mb. In contrast, repeat content varied widely, ranging from 0.87 Mb in HMU_001 to 1.45 Mb in Gilliam (Figure 3B, Table S6). Variation in size of repetitive sequences largely accounted for differences in total genome size, supporting repeat-driven genome expansion within the species.
We then classified genes as repetitive if they had at least one additional homolog elsewhere in the same genome. HMU_001 contained 1127 repetitive genes, with the remaining 911 genes classified as non-repetitive (Table S2). Across the 17 genomes, non-repetitive gene numbers were relatively stable (typically ~900; 872 in Gilliam to 932 in TW-22), whereas repetitive gene numbers varied markedly (1127 in HMU_001 to 1837 in Gilliam), indicating that inter-strain gene-content differences are driven primarily by variable expansion of multi-copy gene families (Figure 3B, Table S7). Functionally, repetitive genes were strongly enriched in a subset of COG categories: most notably “T: Signal transduction mechanisms” (repeat-gene ratios 76.27–92.42% across strains), with “S: Function unknown” and “L: Replication, recombination and repair” also predominantly repetitive (>70% in most genomes). In contrast, most housekeeping and metabolic categories showed low duplication, with “C: Energy production and conversion” being essentially non-repetitive in nearly all genomes except JJOtsu5, where category-specific expansion (largely annotated as hypothetical proteins) inflated the repetitive fraction (Table S4). Thus, differences between strains seemed largely quantitative (repetitive genes expansion within categories) rather than qualitative (gain/loss of functional categories).
We further compared pseudogenization rates between repetitive and non-repetitive genes. Across the 17 genomes, the proportion of pseudogene candidates was consistently lower among non-repetitive genes (median, 19.98%; IQR, 1.09%) than among repetitive genes (median, 67.02%; IQR, 11.33%) (Figure 3C, Table S7). This difference was significant across genomes (paired Wilcoxon signed-rank test, p < 0.001), consistent with more frequent truncation or degeneration among repetitive genes, which also showed greater variability in pseudogenization rates.
The repeat landscape of HMU_001 was dominated by mobile genetic elements, particularly RAGEs and insertion sequences (ISs), providing a mechanistic basis for repeat accumulation and genome plasticity. We identified 72 RAGEs in HMU_001 (Figure 3A; Table S8). Using the criteria defined in the Materials and Methods, no complete RAGE was identified in HMU_001; instead, four loci were classified as complete RAGE with truncated or split genes, indicating that although structurally complete RAGE-like regions were present, none retained fully intact mobilization genes (Figure 3D). Overall, HMU_001 RAGEs showed substantial degeneration, including truncation, fragmentation, and loss of canonical mobilization modules, and many were reduced to only a few residual genes. Nevertheless, 22 RAGE loci retained recognizable canonical structural features (Figure S2). Importantly, at least one intact copy of each mobilization gene category was present in the HMU_001 genome, although these genes were dispersed across loci rather than organized in a contiguous module. IS elements were also abundant and dispersed across the chromosome. We detected 283 IS elements, including 54 apparently complete ones with intact transposase genes, while the remainder were truncated or degenerated (Figure 3A, Table S9).
3.5. Orthologous Groups, and Phylogenetic Analysis and MLST Classification
Across the 17 genomes, a total of 40,325 genes were analyzed. Of these, 38,512 genes (38,512/40,325, 95.50%) clustered into 3927 orthogroups, while 1813 genes (1813/40,325, 4.50%) remained unassigned. Among the orthogroups, 343 were species-specific, comprising 1351 genes (3.35% of all genes). A core set of 664 orthogroups was present in all genomes, including 643 single-copy orthogroups (Table S2). A maximum-likelihood tree based on a concatenated alignment of the 643 single-copy core genes separated the 17 genomes into four clades (Figure 4A). HMU_001 grouped with Wuj/2014, TW-1, and UT76 in Clade I. Clade II comprised JJOtsu1, JJOtsu5, JJOtsu6, JJOtsu7, and JJOtsu8 (isolated in Vellore, India) clustered with UT176 and Karp. TW-22, Kato, and Ikeda formed Clade III, whereas TA686, Gilliam, and Boryong formed distinct branches within Clade IV. Notably, JJOtsu6 and JJOtsu1 were nearly identical, and Wuj/2014 and TW-1 also showed very close genetic relatedness.
Mapping clades to sampling locations suggested geographic structure with partial overlap (Figure 4B). Clade I comprised genomes from southeastern coastal and offshore islands of China and Thailand. Clade II encompassed genomes from South Asia (India and Thailand) and Southeast Asia (New Guinea). Clade III included genomes from regions in Northwest Pacific. Clades I to III exhibited relatively localized distributions. In contrast, Clade IV showed a broader distribution spanning Korea, Thailand, and the India–Myanmar border region, consistent with its greater phylogenetic divergence. The tsa56 genotypes, the most widely used scheme for O. tsutsugamushi strain classification, did not fully align with the core-genome clade structure. In Clade I, all strains were identified as Karp_A by tsa56 genotype except HMU_001 (TA763_A). Clade II contained a mixture of Karp_A, Karp_B, Karp_C, Kato_A, and TA763_B genotypes. Clade III included Kato_A, Kato_B, and JG_A/Gilliam genotypes. Clade IV comprised Shimokoshi, Gilliam, and Boryong/Karp genotypes (Table 1; Figure S3).
Sequence types inferred from the seven housekeeping loci are summarized in Table 1, and detailed allele profiles are provided in Table S10. Overall, the 17 strains showed substantial MLST diversity. JJOtsu6 and JJOtsu1 shared identical allele profiles at all seven loci and were both assigned to ST132. Wuj/2014 and TW-1 shared identical alleles at six of seven loci and differed only at nuoF. These highly similar MLST profiles were consistent with their close relationships in the core-genome phylogeny. Several strains, including HMU_001, JJOtsu8, Wuj/2014, TW-1, TW-22, and TA686, had seven-locus profiles that had not been assigned official STs in the PubMLST database at the time of analysis; for these strains, the nearest STs were provided in Table S10.
3.6. Genome Synteny Analysis
Genome-wide synteny between HMU_001 and the other 16 complete genomes was shown in Figure 5A. Overall, HMU_001 showed limited synteny with most strains. However, relatively higher synteny was observed with Wuj/2014, TW-1, and UT76, consistent with their clustering with HMU_001 in the core-genome phylogeny. Quantitative comparisons of syntenic blocks (Figure 5B, Table S11) showed that HMU_001 shared the largest proportion of syntenic sequence with UT76 (81.97%), TW-1 (81.43%), and Wuj/2014 (81.00%). Correspondingly, HMU_001 aligned in 168, 158, and 157 syntenic blocks with UT76, TW-1, and Wuj/2014, respectively. The lowest synteny with HMU_001 was observed for TA686 and Boryong, which belong to the most distant Clade IV: only 46.93% and 50.97% of the HMU_001 genome aligned to these strains, respectively, across 147 syntenic blocks in each comparison.
Pairwise synteny analysis across all 17 genomes further confirmed that most genome pairs showed limited collinearity, whereas a small number of closely related genomes retained extensive synteny. The closest pair in the phylogeny, JJOtsu6 and JJOtsu1, showed near-identical genome organization with only minor inversions and small differences. Wuj/2014 and TW-1 likewise exhibited substantially higher synteny than more distantly related pairs such as Boryong and TA686 (Figure 5C). Syntenic block counts and aligned proportions for all genome pairs are provided in Table S11 and summarized in a bubble plot (Figure 5D). Pairs with very high syntenic proportions typically showed fewer, larger pieces of blocks (JJOtsu6–JJOtsu1: 94.47% in JJOtsu6 and 99.24% in JJOtsu1 across 12 blocks; Wuj/2014–TW-1: 98.88% in Wuj/2014 and 97.12% in TW-1 across 32 blocks). As the proportion of syntenic sequence decreased, block number generally increased and then declined. At the low-synteny extreme, Boryong and TA686 shared the lowest syntenic proportions (35.25% in Boryong and 33.23% in TA686), but without exhibiting the highest block count (121 blocks).
4. Discussion
In this study, we isolated O. tsutsugamushi HMU_001 from a scrub typhus patient in Hainan, China, and generated a closed circular genome using long-read sequencing complemented by short-read polishing, thereby adding a complete genomic reference from a tropical island setting of ongoing public-health importance. HMU_001 provided a complete genome for the TA763_A tsa56 lineage and, among the 17 complete genomes analyzed in this study, represented the most compact genome. Notably, this compaction reflects reduced repeat burden rather than contraction of the ~1.00 Mb nonrepetitive backbone, reinforcing a central feature of O. tsutsugamushi evolution in which genome size variation is primarily repeat-driven while core content remains comparatively stable. Using standardized re-annotation and genome-wide comparisons, we further showed that repeats largely derived from genes associated with “Signal transduction mechanisms”, “Function unknown” and “Replication, recombination and repair”, consistent with the gene function of mobile elements, including RAGEs and IS elements. These elements were presumed to be closely related with pervasive disruption of gene order, yielding minimal synteny across most genome pairs but substantial collinearity among the closest relatives. A core-genome phylogeny based on 643 single-copy genes resolved four clades with partial geographic structure and placed HMU_001 with Wuj/2014 from Zhejiang and TW-1 from Taiwan. In contrast, tsa56 genotypes were only partly congruent with genome-wide relationships. Together, these findings provide a framework linking repeat proliferation, genome rearrangement, and phylogenetic inference, strengthening the genomic basis for interpreting typing and evolution in this highly plastic, neglected intracellular pathogen.
The epidemiologic context on Hainan Island underscores the value of closed genomes derived from contemporary clinical isolates. Scrub typhus remains as an endemic public health concern in this tropical island setting, particularly among people with frequent outdoor exposure. A spatiotemporal analysis reported 4300 cases during 2008–2023, corresponding to a mean annual reported incidence of 2.75 per 100,000 and an overall upward trend [30]. Studies of undifferentiated fever further suggest substantial exposure and ongoing transmission. In a four-hospital survey of 680 patients, the seropositivity of IgM, IgG, and PCR was 23.97%, 36.62%, and 20.88%, respectively, and 32.79% met diagnostic criteria for scrub typhus, with a clear autumn peak [19]. Scrub typhus in Hainan also occurs within a broader rickettsial landscape: an outpatient cohort detected O. tsutsugamushi and Anaplasma phagocytophilum DNA in 14.51% and 5.57% of patients, and 10.96% of scrub typhus cases showed co-detection, emphasizing the need for reliable molecular discrimination among co-circulating agents [31]. From a genomic epidemiology perspective, patient-centered typing across Hainan has revealed at least 12 major tsa56 genotypes, including lineages found in other parts of Southeast Asia with identical MLST sequence types across locations separated by 23–125 km, indicating within-island dissemination [20]. The regional distribution of related genotypes suggests that transmission connectivity may extend beyond strictly local rodent–chigger cycles. Because several of these genotypes are prevalent along the East Asian–Australasian flyway and multiple genetic-diversity hotspots overlap protected wetlands used by wintering waterbirds, migratory hosts that carry trombiculid mites warrant consideration as potential contributors to longer-range dispersal [20]. Coordinated sampling across humans, small mammals, mites, and bird-associated habitats coupled with genome-wide sequencing with help to address this hypothesis. In this setting, HMU_001 provides a closed genome for the TA763_A lineage from a recent clinical case, strengthening the reference framework for molecular surveillance and future studies integrating clinical metadata with genome-wide variation.
We further paired the closed genome of HMU_001 with a set of standardized in vitro measurements to provide biological context for genome-scale comparisons. In cell culture, HMU_001 showed the characteristic pleomorphic intracellular forms and perinuclear clustering consistent with the established intracellular lifecycle of O. tsutsugamushi. Using a qPCR-based assay, we quantified replication in endothelial cells and estimated a minimum doubling time of 9.36 ± 1.78 h. Previous reported qPCR-derived doubling times for diverse strains span approximately 9.50–19.85 h under fibroblast-based assays [46]. However, because kinetic estimates are highly dependent on host cell type and assay design, and HMU_001 was measured here in HUVECs, these published values are provided only as a general reference range and should not be interpreted as directly comparable to the present measurements. This caution is consistent with experimental evidence that cellular tropism and disease manifestations can vary across models [47,48], and that in vitro growth does not necessarily predict in vivo pathogenicity, which likely reflects a multifactorial interplay between bacterial determinants and host responses [21]. Nevertheless, pairing a contemporary clinical isolate with both phenotyping and a closed genome enables testable, multi-strain comparisons. Such studies can assess whether genome architecture, particularly repeat burden and mobile element content, is associated with replication dynamics in relevant cell types, dissemination phenotypes, or clinical severity.
Comparative analyses across the 17 complete genomes provide a framework for interpreting O. tsutsugamushi genome evolution. Despite marked genome size heterogeneity among O. tsutsugamushi strains, the nonrepetitive backbone remained constrained at approximately 1.00 Mb across strains, approached the size range typical of other compact genomes in Rickettsiales [27]. In contrast, differences in total genome size were explained primarily by expansion and contraction of repetitive DNA. A broadly comparable pattern has been reported in other Rickettsiales, including Rickettsia buchneri, and Wolbachia pipientis (family Anaplasmataceae). R. buchneri harbors a repeat fraction on the order of ~35% in a 1,777,907-bp genome, whereas W. pipientis strain wMel contains a much lower repeat fraction (~14.2%) in a streamlined 1,267,782-bp genome [49,50]. In this context, HMU_001 represents a lower-repeat endpoint within O. tsutsugamushi (46.08%) and, among the 17 complete genomes analyzed in this study, the most compact genome (1,895,724 bp). Its smaller size therefore tracks with a reduced repetitive fraction rather than with substantial loss of conserved functional gene content. Accordingly, HMU_001 shares broadly similar COG category composition with other strains, while most interstrain differences reflect copy-number variation within repeat-enriched categories instead of gain or loss of entire functional repertoires. This pattern aligns with pan-genome analyses suggesting that diversification in O. tsutsugamushi is driven in part by duplication and divergence of existing genes rather than extensive acquisition of novel pathways [51].
The same repeat landscape provides a mechanistic basis for pervasive genome rearrangement. In O. tsutsugamushi genomes, repeats are dominated by RAGEs, which encode an F-type T4SS, and by IS elements, both of which likely generate abundant homologous substrates for non-allelic recombination and structural change. RAGEs are conceptually related to F plasmid-like conjugation systems but exist as chromosomally integrated elements in O. tsutsugamushi. RAGE-like elements also occur in other Rickettsiaceae, including Rickettsia bellii, Rickettsia massiliae, Rickettsia felis, and Rickettsia buchneri. However, unlike the extensive amplification and degeneration seen in O. tsutsugamushi, these species typically carry only one or a few intact or near-intact elements [49,52]. Beyond integrases and Tra/Trb modules, RAGE regions commonly carry diverse cargo genes including multicopy ankyrin- and TPR-repeat proteins, several of which have been experimentally implicated in host-pathway manipulation [53,54,55]. In HMU_001, most RAGEs and many insertion sequences are fragmented, and even the most complete RAGE candidates show disrupted mobilization modules, indicating substantial decay. Nevertheless, intact copies of key mobilization genes persist in a dispersed configuration, and the high density of homologous repeats could continue to promote recombination-driven rearrangements even if only a minority of elements retains residual mobility [28]. Consistent with an expansion–degeneration cycle, domain-based inspection (SMART) and pseudogene prediction indicate that repetitive genes are disproportionately prone to pseudogenization compared with nonrepetitive genes, as previously noted for strain Ikeda [27], and more broadly across Rickettsiales, where pseudogenes can arise through progressive degradation of ancestral genes as well as rapid decay of genes gained by horizontal transfer or duplication [56]. Notably, pseudogenization rates are relatively similar among non-repetitive genes across strains, but are much higher and more variable among repetitive genes. Together, these observations suggest that strain-specific patterns of repeat proliferation and degeneration, rather than expansion or contraction of the conserved backbone, may represent an important feature of intraspecific genome evolution in O. tsutsugamushi.
Genome-wide phylogenies provide a useful framework for interpreting strain relatedness and broad patterns of genetic clustering in O. tsutsugamushi. In our orthogroup analysis, 95.5% of genes clustered into shared orthogroups, yet only 664 orthogroups were universally present and 643 were single-copy in all genomes, underscoring a relatively constrained core against a large and dynamic accessory complement. The maximum-likelihood phylogeny inferred from these 643 single-copy core genes resolved four clades and placed HMU_001 with Wuj/2014 from Zhejiang, TW-1 from Taiwan, and UT76 from Thailand, whereas the remaining clades grouped with strains from South Asia, Japan, Korea, and the India–Myanmar border region, indicating a geographic signal that is present but incomplete. However, geographic interpretation should remain cautious, because no formal population-structure analyses were performed, and phylogeographic inference is currently limited by sparse genome sampling and extensive recombination. The tsa56 genotyping, the most widely used scheme for strain classification, was only partly congruent with the core-genome clades in our dataset. This discordance is biologically plausible for an immunodominant surface antigen subject to locus-specific recombination and selection, and it reinforces the limitations of single-locus classification in a highly recombinant species. When closed genomes are not available, an immediately testable interim approach is augmenting TSA56 with additional antigen loci; by which concatenated ScaA and TSA56 amino acid sequences might be an alternative option to better reflect the true situation of genome-scale phylogeny than TSA56 alone [25]. By adding a closed genome for the TA763_A lineage, HMU_001 strengthens the reference framework for interpreting tsa56-based calls and supports continued expansion of complete genomes across underrepresented regions and genotypes to improve molecular epidemiology and inference of strain movement.
Beyond variation in repeat burden and pseudogenization, genome structure is exceptionally labile in O. tsutsugamushi. Although other intracellular bacteria such as Wolbachia pipientis also harbored abundant mobile genetic elements and undergo genome rearrangement, the extent of erosion is substantially greater in O. tsutsugamushi [50]. In contrast, many Rickettsia species maintain compact, relatively stable genomes with conserved gene order, with notable exceptions such as Rickettsia bellii and Rickettsia felis [57,58]. Earlier complete-genome studies and subsequent long-read comparisons have emphasized a mosaic architecture in which core gene islands were interleaved with large repeat blocks, with disorder in the arrangement of the core gene islands across strains at the same time [22,28].
Our pairwise synteny analyses extend this framework quantitatively across the 17 closed genomes. HMU_001 shared the highest syntenic fractions with its closest relatives UT76, TW-1, and Wuj/2014, yet these alignments were broken into over a hundred syntenic blocks, indicating that numerous rearrangement breakpoints can accumulate even within a single clade. At the species scale, appreciable collinearity was restricted to near-identical pairs such as JJOtsu6–JJOtsu1 and Wuj/2014–TW-1, which retained nearly complete syntenic coverage but still differed by dozens of blocks. In contrast, the most distant comparisons, such as Boryong–TA686, retained only about one third syntenic sequence and were fragmented into more than a hundred of blocks. Moreover, the relationship between syntenic fraction and block count formed an almost continuous gradient across most genome pairs rather than discrete strata, supporting a model of cumulative, ongoing remodeling rather than rare episodic “scrambling” events. These patterns are consistent with repeated non-allelic homologous recombination between abundant repeat copies, many of which are derived from RAGE and IS elements, plausibly contributing to inversions, translocations, and segment loss while leaving the conserved backbone comparatively intact. This degree of structural turnover complicates comparative inference because genomic context and conserved-island boundaries are frequently not transferable between strains and are best assessed using closed genomes. At the same time, highly syntenic strain pairs provide informative snapshots of early divergence, offering an opportunity to quantify how rapidly rearrangements accumulate and to test whether structural trajectories correlate with local transmission dynamics or strain-level phenotypes.
Finally, our study highlights a practical way to generate closed O. tsutsugamushi genomes while reducing hands-on manipulation. The extreme repeat content of O. tsutsugamushi complicates closure with short-read data, and long-read approaches depend on careful DNA preparation and sufficiently long molecules to span repeat regions. Earlier workflows often relied on extensive enrichment or purification from host cells, including density-gradient methods, or enzymatic treatment to eliminate residual host DNA to obtain high-quality bacterial genomic DNA material. In contrast, we sequenced HMU_001 as a nonclonal isolate without plaque purification or density-gradient enrichment, combining deep long-read sequencing with stringent host read removal and short-read polishing to obtain a closed genome, and we supported within-isolate homogeneity using genome-wide allele-frequency profiles and stability of tsa loci across passages. This reduced handling is operationally relevant because O. tsutsugamushi propagation is technically demanding and typically requires biosafety level 3 facilities. In parallel, probe-based target enrichment strategies from human and chigger samples have been developed to address low pathogen DNA and the technical, safety, and cost constraints of cell culture, enabling genome-scale phylogenetic analyses when culture or genome closure is not feasible [59]. Together, reduced-purification culture-to-genome workflows and culture-independent enrichment should help build denser, geographically representative genome panels for surveillance and for testing how repeat-mediated genome remodeling relates to transmission and disease.
Our study has limitations. First, our inferences about genome compaction and repeat dynamics are based on a single contemporary isolate, so the full range of repeat burden and RAGE configurations circulating in Hainan remains to be defined. Second, variation in repeat burden, RAGE integrity, and putative effector repertoires was inferred from sequence and domain annotations, and the biological consequences of these differences require experimental validation. Third, even with standardized re-annotation, repeat-dense regions remain difficult to annotate and compare consistently across studies, which can influence gene and pseudogene estimates. Despite these caveats, HMU_001 provided a high-quality closed genome from a recent clinical isolate in a high-incidence setting, and strengthened the empirical foundation for linking repeat proliferation, genome rearrangement, and strain typing in O. tsutsugamushi.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bhandari M. Singh R.K. Laishevtcev A. Mohapatra T.M. Nigam M. Mori E. Vasconcelos de Lacerda B.C.G. Coutinho H.D.M. Mishra A.P. Revisiting scrub typhus: A neglected tropical disease Comp. Immunol. Microbiol. Infect. Dis.20229091, 10188810.1016/j.cimid.2022.10188836252451 · doi ↗ · pubmed ↗
- 2Xu G. Walker D.H. Jupiter D. Melby P.C. Arcari C.M. A review of the global epidemiology of scrub typhus P Lo S Negl. Trop. Dis.201711 e 000606210.1371/journal.pntd.000606229099844 PMC 5687757 · doi ↗ · pubmed ↗
- 3Walker D.H. Scrub Typhus-Scientific Neglect, Ever-Widening Impact N. Engl. J. Med.201637591391510.1056/NEJ Mp 160849927602663 · doi ↗ · pubmed ↗
- 4Jiang J. Richards A.L. Scrub Typhus: No Longer Restricted to the Tsutsugamushi Triangle Trop. Med. Infect. Dis.201831110.3390/tropicalmed 301001130274409 PMC 6136623 · doi ↗ · pubmed ↗
- 5Alkathiry H.A. Alghamdi S.Q. Morgan H.E.J. Noll M.E. Khoo J.J. Alagaili A.N. Makepeace B.L. Molecular Detection of Candidatus Orientia chuto in Wildlife, Saudi Arabia Emerg. Infect. Dis.20232940240610.3201/eid 2902.22113136692446 PMC 9881777 · doi ↗ · pubmed ↗
- 6Abarca K. Martínez-Valdebenito C. Angulo J. Jiang J. Farris C.M. Richards A.L. Acosta-Jamett G. Weitzel T. Molecular Description of a Novel Orientia Species Causing Scrub Typhus in Chile Emerg. Infect. Dis.2020262148215610.3201/eid 2609.20091832818400 PMC 7454112 · doi ↗ · pubmed ↗
- 7Dasgupta S. Asish P.R. Rachel G. Bagepally B.S. Chethrapilly Purushothaman G.K. Global seroprevalence of scrub typhus: A systematic review and meta-analysis Sci. Rep.2024141089510.1038/s 41598-024-61555-938740885 PMC 11091130 · doi ↗ · pubmed ↗
- 8Gaba S. Gupta M. Singla N. Singh R. Clinical outcome and predictors of severity in scrub typhus patients at a tertiary care hospital in Chandigarh, India J. Vector Borne Dis.20195636737210.4103/0972-9062.30204133269738 · doi ↗ · pubmed ↗
