Draft genome dataset of Xylaria sp. KR-3U isolated from leaves of the medicinal plant Catharanthus roseus

Kankana Roy; Abhijit Bandyopadhyay

PMC · DOI:10.1016/j.dib.2026.112543·February 7, 2026

Draft genome dataset of Xylaria sp. KR-3U isolated from leaves of the medicinal plant Catharanthus roseus

Kankana Roy, Abhijit Bandyopadhyay

PDF

Open Access

TL;DR

This paper provides a draft genome sequence of Xylaria sp. KR-3U, an endophyte from Catharanthus roseus leaves in India, including gene annotations and biosynthetic clusters.

Contribution

The study presents a high-quality draft genome and functional analysis of Xylaria sp. KR-3U, including biosynthetic gene clusters and enzyme profiles.

Findings

01

The genome assembly has 11,916 predicted protein coding genes with 97.0% completeness based on BUSCO analysis.

02

The dataset includes 111 biosynthetic gene clusters and 556 CAZyme-encoding genes, with 39.74% predicted to be secreted.

03

Genome data and annotations are publicly available through NCBI and Mendeley Data for reuse and transparency.

Abstract

We present a draft genome dataset for Xylaria sp. (KR-3U) isolated as an endophyte from Catharanthus roseus leaves in India. Whole genome sequencing was performed using Illumina NovaSeq 6000 platform, generating 35.2 million paired-end raw reads (150 bp), providing ∼120× coverage (∼5.32 Gb of raw data) for a 44.24 Mb assembly (960 contigs >1 kb, GC content of 47.76%, and an N50 of 101,126 bp). Read remapping showed 96.07% alignment to the assembly. BUSCO (fungi_odb10) analysis indicated 97.0% completeness. Gene prediction using AUGUSTUS identified 11,916protein coding genes. BLASTp searches against the Swiss-Prot database yielded significant hits for 7299 proteins, of which 7204 were mapped to Gene Ontology (GO) terms and 5869 sequences received functional annotations . Integration of InterProScan-supported annotations resulted in 5645 proteins assigned at least one GO term. KEGG KAAS…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Figures3

Click any figure to enlarge with its caption.

BUSCO completeness assessment of the *Xylaria* sp. KR-3U draft genome. BUSCO analysis based on the fungal_odb10 lineage dataset shows that 97.0 % (742) of the 758 orthologous groups were complete, of which 96.6 % (732) were single-copy and 0.4 % [[3](#bib0003)] duplicated. Only 0.8 % [[6](#bib0006)] was fragmented and 2.2 % [[17](#bib0017)] were missing, indicating high completeness and assembly quality. The missing BUSCOs may be attributable to assembly fragmentation or lineage specific divergence. BUSCO outputs and high resolution Figure 1 is available at Mendeley Data under the BUSCO zip fi

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMicrobial Natural Products and Biosynthesis · Synthesis and Biological Activity · Genomics and Phylogenetic Studies

Full text

Specifications TableSubjectMycology, Fungal GenomicsSpecific subject areaWhole genome sequencing and functional annotation of endophytic fungiData formatGenome assembly (FASTA), raw reads (FASTQ), annotated gene dataset (GFF3), graphical figures (PNG)Type of dataTables; Figures; Annotated sequence filesData collectionSymptomless leaves of Catharanthus roseus were surface sterilized and cultured on potato dextrose agar (PDA) media for fungal isolation. DNA was extracted using Lucigen kit, library preparation and sequencing were performed using the Illumina NovaSeq 6000 platform (paired-end 150 bp). Quality of raw reads was checked with FastQC v0.12.1 and MultiQC v1.14, trimming performed with fastp v0.20.1. De novo assembly was generated using MEGAHIT v1.2.9, Assembly completeness was assessed using BUSCO v5.3.2, (fungi_odb10 dataset). Functional annotation was performed using AUGUSTUS, BLASTp/Swiss-Prot, InterProScan, KEGG KAAS, antiSMASH v8.0, and dbCAN3 pipelines.Data source locationLatitude: 23.23°N; Longitude: 87.86°E; City/Town: Burdwan; State: West Bengal; Country: IndiaData accessibilityBioProject: PRJNA1335662 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1335662); BioSample: SAMN52018968 (https://www.ncbi.nlm.nih.gov/biosample/SAMN52018968); SRA: SRR35731853 (https://www.ncbi.nlm.nih.gov/sra/SRR35731853); GenBank accession: JBSEFG000000000; Genome assembly (FASTA), gene annotation (GFF3), and secondary genome analysis files are openly available at Mendeley Data, V3, Direct URL: https://data.mendeley.com/datasets/b8jn5rtwkg/3.Related research articleNone

Value of the Data

1

•This dataset provides a high-quality draft genome resource for an endophytic Xylaria species, isolated from Catharanthus roseus leaf tissue, supported by 97.0 % BUSCO completeness, and accompanied by the raw sequencing reads and downstream analysis outputs deposited in open repositories.
•With ∼120× Illuminacoverage, 96.07 % read remapping, and 11,916 predicted protein-coding genes, together with functional annotation and pathway mapping (GO/InterProScan-supported annotations and KEGG KO assignments), this dataset enables robust analyses of gene repertoires, metabolic pathways, and conserved versus lineage-specific functions within Xylariaceae and related Ascomycota.
•The dataset supports functional and ecological investigations through a curated CAZyme repertoire (556 high confidence CAZymes), including 221 predicted secreted CAZymes (∼39.7 %), facilitating studies of plant-associated enzymatic functions and potential lignocellulose-degrading capacity.
•The identification of 111 biosynthetic gene clusters spanning terpene, NRPS, indole and hybrid classes provides a foundation for comparative BGC mining, assessment of biosynthetic diversity, and exploration of genome-encoded secondary metabolite potential in endophytic fungi.
•All files required for transparent reuse and full computational reproducibility (raw data, intermediate outputs, and final result files) are publicly accessible, enabling independent reanalysis, parameter comparison, and method benchmarking.

Background

2

The genus Xylaria (Xylariaceae, Ascomycota) comprises saprophytic and endophytic fungi commonly associated with decaying wood, soil, and plant tissues [1]. Members of this genus are well known for their ecological roles in wood decomposition and their capacity to synthesize diverse bioactive secondary metabolites, including antimicrobial, anticancer, and antioxidant compounds [2]. While an earlier morphological study hinted at the presence of Xylaria as endophytes in Catharanthus roseus, these lacked molecular evidence and culture deposition [3]. The draft genome dataset presented here provides a curated genomic resource for an endophytic Xylaria species isolated from C. roseus, supporting integrative investigations into its genomic architecture, functional gene repertoire and biosynthetic potential.

Data Description

3

This section describes the raw sequencing reads, the assembled genome, structural and functional annotations, and secondary analyses (CAZyme and BGC profiling). All files listed below are publicly available through NCBI and Mendeley Data. The assembled genome of Xylaria sp. KR-3U consists of 960 contigs (>1 kb) with a total size of 44.24 Mb and a GC content of 47.76 %. The largest contig is 540,832 bp, with an N_50_ value of 101,126 bp. K-mer based genome profiling using GenomeScope estimated a haploid genome size of ∼42.56 Mb, with very low heterozygosity (0.144 %) and a model fit of 98.36 %. The analysis indicated minimal repetitive content (∼0.17 % of the genome) and a low sequencing error rate (0.18 %), consistent with the assembled genome characteristics. BUSCO analysis using the fungi_odb10 dataset showed 97.0 % genome completeness (95–98 % for most Xylaria species), indicating a near complete representation of the gene space. Gene prediction using AUGUSTUS identified11,916protein-coding genes comprising 40,514 coding exons (CDS) and 28,685 introns. On average, each gene contained 3.4 exons and 2.4 introns. The mean exon length was 458 bp, while introns averaged 117 bp in length. The average gene length was estimated to be 1839 bp.Blast2GO analysis was performed with the AUGUSTUS predicted 11,916 protein coding genes. BLASTp searches against the Swiss-Prot database yielded significant hits for 7299 sequences, of which 7204 were mapped to Gene Ontology (GO) terms and 5869 sequences were annotated. Inclusion of InterProScan supported annotations resulted in 5645 proteins assigned at least one GO term.. KEGG annotation assigned 4,144genes to 3,391unique KO numbers,while 7772 genes(65.22 %) did not receive a KO assignment.. KEGG Mapper reconstruction identified 423 pathways, 44 BRITE hierarchies, anmd 83 KEGG modules.. 556 high confidence (≥ 2 tools) Carbohydrate active enzymes (CAZymes) were identified using dbCAN3, in which 274 Glycoside Hydrolases (GHs), 142 Auxiliary Activity (AA) enzymes, 75 Glycosyl Transferases (GTs), 47 Carbohydrate Esterases (CEs), and 16 Polysaccharide Lyases (PLs), and 2 carbohydrate-binding modules (CBMs) were found.SignalP analysis predicted 39.74 % (221 genes; ≥2tools) of these CAZymes are secreted.in addition, CAZyme annotation was independently performed using Conserved Unique peptide Pattern (CUPP) to capture potentially divergent or lineage-specific CAZyme candidates. CUPP identified a higher number of CAZyme associated sequences across major classes (413 GH, 215 AA, 237 GT, CE 70, and 28 PL; total 963 CAZymes) Biosynthetic gene clusters (BGC) analysis identified 111 predicted BGC regions distributed across 104 genome contigs. Predicted clusters included terpene, type I polyketide synthase (T1PKS), non-ribosomal peptide synthetases (NRPS), NRPS-like, indole, isocyanide, fungal-Ripp, and multiple hybrid cluster combinations, with terpene and T1PKS clusters being the most abundant. Comparison against the MIBiG database revealed 27 BGCs with detectable similarity to previously characterized clusters, spanning high, medium, and low confidence matches. The complete anyisMASH output, including the interactive HTML report and associated result files are provided as part of the deposited dataset (Fig. 1, Fig. 2, Fig. 3) (Table 1).Fig. 1BUSCO completeness assessment of the Xylaria sp. KR-3U draft genome. BUSCO analysis based on the fungal_odb10 lineage dataset shows that 97.0 % (742) of the 758 orthologous groups were complete, of which 96.6 % (732) were single-copy and 0.4 % [3] duplicated. Only 0.8 % [6] was fragmented and 2.2 % [17] were missing, indicating high completeness and assembly quality. The missing BUSCOs may be attributable to assembly fragmentation or lineage specific divergence. BUSCO outputs and high resolution Figure 1 is available at Mendeley Data under the BUSCO zip file.Fig 1 dummy alt textFig. 2CAZyme class comparison between dbCAN3 and CUPP in Xylaria sp. KR-3U. Comparison of carbohydrate-active enzyme (CAZyme) gene counts across different functional classes (GH, AA, GT, CE, PL, and CBM). . Total CAZyme counts are shown for each class; The annotation files, CSV table and high resolution Figure 2 are available under dbCAN3,CUPP and Figures zip files respectively in Mendeley Data.Fig 2 dummy alt textFig. 3Biosynthetic gene cluster (BGC) class distribution in Xylaria sp. KR-3U predicted by antiSMASH. Distribution of predicted biosynthetic gene clusters (BGC) across major classes (terpene, T1PKS, NRPS-like, NRPS, indole, and others), showing the number of single-type (pure) clusters and multi-type (hybrid) cluster. The underlying CSV file and high resolution Figure 3 are deposited in Mendeley Data inside antiSMASH and Figures respectively.Fig 3 dummy alt textTable 1Genome assembly statistics for Xylaria sp. KR-3U.Table 1 dummy alt textMetricValueTotal sequencing reads (paired-end)35.2 millionTotal assembly size (Mb)44.24# contigs (≥ 0 bp)960# contigs (≥ 1000 bp)960# contigs (≥ 5000 bp)733# contigs (≥ 10,000 bp)639# contigs (≥ 25,000 bp)471# contigs (≥ 50,000 bp)288Largest contig (bp)540,832GC content ( %)47.76N_50_ (bp)101,126N_75_ (bp)51,856L_50_ (bp)125Genome coverage242xAssembly levelContig

Experimental Design, Materials and Methods

4

Sample collection and fungal isolation

4.1

Visibly healthy, fully expanded leaves were collected from mature, perennial plants of Catharanthus roseus growing in the medicinal plant garden of The University of Burdwan, West Bengal, India (Latitude: 23.23°N; Longitude: 87.86°E), during early winter season (November end, 2021). Samples were washed under running tap water (10 min) first, then treated with 0.1 % Tween-20 (5 min) and rinsed twice with distilled water (1 min each). Surface sterilization was performed under laminar airflow using 70 % ethanol (1 min) followed by 0.1 % mercuric chloride (30 sec), and subsequently rinsed thrice with sterile double distilled water (1 min each). Leaves were blot-dried using sterilized blotting paper, aseptically cut into ∼5 mm midrib-inclusive segments, and plated on potato dextrose agar (PDA) media supplemented with streptomycin sulfate (0.05 g/L). Plates were incubated at 28°±1° C in darkness and monitored daily for fungal emergence. Sterility of the procedure was verified using imprint control plates and plating of final rinse water on PDA, and the plates were incubated under identical conditions. No microbial growth was observed from the control plates after 14 days of incubation, confirming the effectiveness of the sterilization process.Emerging fungal colonies from the plated leaf segments were subcultured on fresh PDA plates to obtain pure cultures. The isolate of interest, designated as Xylaria sp. KR-3 U was maintained on PDA medium for further genomic analysis.

DNA extraction and library preparation

4.2

Genomic DNA was extracted using the Lucigen DNA extraction kit. DNA concentration weas quantifiedusing the Qubit HS DNA assay, yielding a concentration of 132 ng/µL with a total of 2640 ng DNA from a 20 µL eluate. DNA quality and integrity were assessed by agarose gel electrophoresis, which confirmed the presence of high molecular weight genomic DNA suitable for downstream whole genome sequencing. Sequencing libraries were prepared using the standard Illumina library preparation protocol and validated with an Agilent TapeStation 5000 system, showing a peak library size of 503 bp with approximately 80.7 % of fragments ranging between 235 and 963 bp, indicating a narrow and uniform library fragment size distribution. Final library concentration was determined to be 40.8 ng/µL prior to sequencing.

Sequencing and quality control

4.3

Whole genome sequencing was performed on an Illumina NovaSeq 6000 platform (150 bp paired-end reads). A total of 35,246,136 raw reads, corresponding to 5.32 Gb of sequence data, were generated. Quality assessment was performed using FastQC v0.12.1 [4] and MultiQC v1.14 [5], showing 94.06 %of bases above Q30. Adapter trimming and quality filtering were carried out with fastp v0.20.1 with default parameters [6], using the Illumina P7 adapter for read 1 (AGATCGGAAGAGCACACGTCTGAACTCCAGTCA) and P5 adapter for read 2 (AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT), retaining 34,99,4746 high-quality reads (99.29 % reads were retained after processing). The full Fast QC and MultiQC HTML reports, fastp JSON and HTML logs, and pre/post trimming read statistics (Table 3) are deposited in Mendeley Data.

Genome size estimation using k-mer analysis

4.4

Genome characteristics were estimated using k-mer-based analysis prior to genome assembly. Raw paired-end Illumina reads were used to generate k-mer frequency counts using Jellyfish v2.3.0 [7]. A k-mer size of 21 was used to construct a k-mer frequency histogram. This histogram file was subsequently used as input for GenomeScope v1.0 [8], which was applied to estimate genome size, heterozygosity, repeat content, and sequencing error rate. GenomeScope outputs included model-fitted plots and summary statistics are deposited in Mendeley Data.

Genome assembly and quality assessment

4.5

De novo assembly was performed using MEGAHIT v1.2.9 [9] with multiple k-mer sizes (21, 49, 77, 105, 133 and141). Contigs shorter than 1000 bp were removed. Assembly statistics including total assembly length, N_50_, L_50_, GC content, number of contigs, and size distribution were generated using QUAST v5.2.0 [10]. QUAST HTML reports and summary statistics were generated and archived in Mendeley Data.

Contamination screening and generation of clean assembly

4.6

During submission to the NCBI Whole Genome Shotgun (WGS) database, the assembly underwent NCBI’s automated contamination screening pipeline which identified and removed nine short bacterial contigs (<2 kb each). The decontaminated assembly retained all major assembly metrics with only a slight reduction in total contigs (from 969 to 960). This filtered assembly was used for all downstream analyses.

Read mapping to the clean assembly

4.7

Paired-end quality-filtered Illumina reads were mapped back to the decontaminated genome assembly to assess assembly accuracy and read representation. Read alignment was performed using Bowtie2 v2.4.4 [11] in end-to-end mode with default parameters, yieldinga 96.07 % mapping rate, indicating high assembly accuracy. The resulting SAM file was converted to BAM, and indexed using SAMtools v1.13 [12].. Alignment files and mapping logs are deposited in Mendeley Data.

Gene prediction and genome completeness assessment

4.8

Protein-coding genes were predicted from the clean genome assembly using AUGUSTUS v3.4.0 [13], employing Histoplasma capsulatum as the training model. Predicted gene models were exported as GFF3 files, along with corresponding predicted protein (FASTA) and transcript sequences. Assembly completeness was assessed using BUSCO v5.2.2 [14] with the fungi_odb10 lineage dataset, revealing 97.0 % completeness (Table 2)Table 2. Summary of BUSCO Scores.Table 2 dummy alt textComplete BUSCOs (C)735(97.9 %)Complete and single-copy BUSCOs (S)732 (97.6 %)Complete and duplicated BUSCOs (D)3 (0.3 %)Fragmented BUSCOs (F)6(0.6 %)Missing BUSCOs (M)17 (1.4 %)Total BUSCO groups searched758Table 3Summary of read quality before and after trimming.Table 3 dummy alt textParametersBefore trimmingAfter trimmingSequencing typePaired-end (151 bp × 2)Paired-end (148 bp× 2)Total reads (million)35.246 M34.995 MTotal bases (Gb)5.322 Gb5.193 GbQ20 bases ( %)97.72 %97.97 %Q30 bases ( %)93.71 %94.06 %GC content ( %)47.87 %47.76 %Reads retained after filtering ( %)-99.29 %

Functional annotation of predicted proteins

4.9

Functional annotation of predicted protein-coding genes was performed using Blast2GO v6.0.3 [15]. BLASTp searches were executed via yhe NCBI BLAST web service against the Swiss-Prot databse, Applying a taxonomy filter for Ascomycota (taxid: 4890). An E-value cutoff of 1.0 × 10^−5^ was used, retaining up to 20 hits per query, with low-complexity filtering enabled, word size set to 3, and a high-scoring segment pair (HSP) length cutoff of 33. Gene Ontology (GO) mapping and annotation were performed within Blast2GO using the GO Annotation (GOA) database version 2025.03 and UniProt ID mapping, with GO terms assigned across the biological process, molecular function, and cellular component categories. Protein domain and sequence feature annotation was conducted using InterProScan through the EBI public web service as implemented in Blast2GO. InterProScan outputs were generated in XML format and integrated with BLAST and GO mapped annotations to produce the final functional annotation dataset. All the Blast2Go outputs are available at Mendeley Data under the zip Blast2GO. Functional pathway annotation was performed using KEGG KAAS v2.1 [16], which assigned KEGG Orthology numbers to the predicted protein sequences for pathway mapping. Blast2GO annotation files and KEGG output tables and HTML files are deposited in Mendeley Data.

Carbohydrate active enzyme annotation

4.10

CAZymes were identified with dbCAN3 web server [17] combining HMMER (E-Value < 1e^−15^, coverage > 0.35), DIAMOND (E-Value < 1e^−102^), and Hotpep-based searches against the CAZy database. To ensure high-confidence annotation, only CAZyme genes supported by at least two independent methods (≥2 tools consensus) were retained . Family assignments followed CAZy nomenclature (GH, GT, CE, PL, AA, and CBM). Secreted proteins were identified using SignalP 6.0 [18] (positive signal peptides). In addition, CAZyme prediction was independently performed using the Conserved Unique Peptide Patterns(CUPP) web server, version 2.1.0 [19] to explore potentially divergent or lineage-specific CAZyme candidates. Results from dbCAN3 and CUPP were compared to assess methodological consistency and to provide complementary insights into the CAZyme repertoire of Xylaria sp. kR-3 U. All output files are archived in Mendeley Data.

Secondary metabolite biosynthetic gene cluster prediction

4.11

Secondary metabolite biosynthetic gene clusters were predicted using antiSMASH v8.0.4 in fungal module [20] with default fungal parameters, including hybrid cluster detection and comparison against the MIBiG database and classified as high, medium, or low confidence. Relaxed detection strictness was used to maximize recovery of divergent and cryptic fungal BGCs, which are common in endophytic fungi. antiSMASH HTML, JSON,GBK files are available in Mendeley Data (Table 4).Table 4. Data inventory linking manuscript components to deposited files.Table 4 dummy alt textManuscript component FileFolder name (Mendeley Data V.3 / NCBI)RepositoryNotes/ ContentRaw sequencing reads (Illumina NovaSeq 6000)SRR35731853NCBI SRAPaired-end 150 bp Illumina raw readsRead quality control (raw reads)FastQCMendeley DataPer-sample FastQC reports for raw readsRead quality control (summary)MultiQCMendeley DataIntegrated QC summaryRead preprocessingfastpMendeley DataAdapter trimming and quality filtering reports and logsGenome assemblyJBSEFG000000000NCBIDecontaminated MEGAHIT assembly (960 contigs) used for all downstream analysesAssembly quality assessmentQUASTMendeley DataQUAST summary statistics and HTML reportContamination screeningContamination screeningMendeley Data; NCBIContamination screening provided by NCBIGenome characteristics estimationGenomeScopeMendeley DataK-mer distribution and genome size estimationRead alignment mappingBowtie2Mendeley DataBAM files, index and mapping logsGene prediction and annotationAUGUSTUSMendeley DataGFF3 and protein filesBUSCO completeness assessment (Figure 1)BUSCOMendeley DataBUSCO v5.2.2 summary and outputsFunctional annotation (GO, BLAST, InterproScan)Blast2GoMendeley DataBLASTp-based functional annotation and Gene Ontology assignmentsCAZyme annotation-dbCAN3 (figure 2)dbCAN3Mendeley DataCAZyme predictions based on dbCAN3 consensus (HMMER+ DIAMOND+ Hotpep)Additional CAZyme annotation (figure 2)CUPPMendeley DataIndependent CAZyme classification using CUPP,full outputsBiosynthetic Gene Clusters predictions (Figure 3)antiSMASHMendeley DataantiSMASH v8.0 (fungal module); full HTML, GBK and JSON outputsPhylogenetic analysisPhylogenyMendeley DataITS alignment files, combined fasta files, phylogenetic tree fileWorkflow documentationWorkflowMendeley DataWorkflow chart from fungal DNA extraction to genome annotations

Molecular identification and phylogenetic analysis

4.12

Genomic DNA was extracted from freshly grown mycelial biomass using a standard phenol-chloroform extraction protocol. The internal transcribed spacer (ITS1–5.8s-ITS2) region was amplified using the universal fungal primers ITS1 (5′-TCCGTAGGTGAACCTGCGG-3′) and ITS4 (5′-TCCTCCGCTTATTGATATGC-3′). Amplicons were visualized on a 1.2 % agarose gel, purified by PEG-NaCl precipitation, and Sanger-sequenced bidirectionally on an ABI® 3730XL automated DNA sequencer. Forward and reverse chromatograms were quality checked and assembled using Lasergene package (DNASTAR) to generate high quality consensus sequence. Preliminary identification was performed using BLASTn against the NCBI nucleotide databse.

For phylogenetic placement, ITS sequences of representative Xylaria species including Xylaria sp. KR-3 U and outgroup taxa (Hypoxylon spp.) were retrieved from GenBank. Sequences were aligned using MAFFT v7.5.1 [21] with FFT-NS-i algorithm and default parameters. Maximum-likelihood (ML) phylogenetic analysis was performed in IQ-TREE v3.0.1 [22]. The best-fit nucleotide substitution model (TIM2+R4) was selected automatically using ModelFinder under the Bayesian Information Criterion (BIC). Branch support was assessed using 1000 ultrafast bootstrap replicates (UFBoot) and 1000 SH-aLRT tests. The resulting ML tree was visualized and annotated in FigTree v1.4.4 [23]. All the raw phylogenetic data, including unaligned ITS sequences, MAFFT alignment (FASTA), newick tree file, model selection log, and complete IQ-TREE output files are deposited in Mendeley Data for independent verification.

Limitations

This draft genome assembly was generated using Illumina short-read sequencing and remains fragmented due to the absence of long-read data. Gene prediction and functional annotations are computationally inferred and may be influenced by database coverage and software versions. Despite these limitations, this paper provides high coverage, contamination-screened genome suitable for downstream comparative and functional genomic analysis.

Ethics Statement

This study did not involve the use of human or animal subjects. The authors confirm that the manuscript represents original work and has not been previously published or submitted elsewhere.

CRediT Author Statement

Kankana Roy: Methodology, Investigation, Data curation, Writing - original draft; Abhijit Bandyopadhyay: Supervision, Writing – review & editing.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Rajtar N.Kielsmeier-Cook J.C.Held B.W.Toapanta-Alban C.E.Ordonez M.E.Barnes C.W.Blanchette R.A.Diverse Xylaria in the Ecuadorian Amazon and their mode of wood degradation Botanical Studies 6420233010.1186/s 40529-023-00403-x 37878199 PMC 10600087 · doi ↗ · pubmed ↗
2Chen W.Yu M.Chen S.Gong T.Xie L.Liu J.Bian C.Huang G.Zheng C.Structures and biological activities of secondary metabolites from Xylaria spp Journal of Fungi 103202419010.3390/jof 1003019038535199 PMC 10971283 · doi ↗ · pubmed ↗
3Siva V.Ramesh V.Manikanandan S.Antibacterial activities of endophytic xylaria sp. Phoma sp strain from Catharanthus roseus L and Vitex negundo L against drug resistant Pseudomonas syringae (MTCC 673), Proteus mirabilis (MTCC 1429), Burkholderia glumae (MTCC 8496) and Moraxella bovis (MTCC 1775) strains Journal of Emerging Technologies and Innovative Research 1032023 h 695h 708
4Andrews S. Fast QC: a quality control tool for high throughput sequence data. 2010. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
5Ewels P.Magnusson M.Lundin S.Käller M.Multi QC: summarize analysis results for multiple tools and samples in a single report Bioinformatics 321920163047304810.1093/bioinformatics/btw 35427312411 PMC 5039924 · doi ↗ · pubmed ↗
6Chen S.Zhou Y.Chen Y.Gu J.fastp: an ultra-fast all-in-one FASTQ pre-processor Bioinformatics 34172018 i 884i 89010.1093/bioinformatics/bty 56030423086 PMC 6129281 · doi ↗ · pubmed ↗
7Marçais G.Kingsford C.a fast, lock-free approach for efficient parallel counting of occurrences of k-mers Bioinformatics 276201176477010.1093/bioinformatics/btr 01121217122 PMC 3051319 · doi ↗ · pubmed ↗
8gw Vurture FJ Sedlazeck Nattestad M.Underwood C.J.Fang H.Gurtowski J.Schatz M.C Genome Scope: fast reference-free genome profiling from short reads Bioinformatics 331420172202220410.1093/bioinformatics/btx 15328369201 PMC 5870704 · doi ↗ · pubmed ↗