Systematic Mining of Bioactive Compounds for Wound Healing From Cayratia Japonica Exosome-Like Nanovesicles: A Workflow Combining LC-MS and DeepSeek Models
Qiang Fu, Wei Ji, Yu-Ping Fan, Jian Yao, Ming-Xia Song, Qiao-Jing Yan

TL;DR
The paper introduces a new method combining LC-MS and AI to identify wound-healing compounds in plant-derived nanovesicles.
Contribution
A novel multimodal framework using LC-MS and DeepSeek models to mine bioactive compounds from plant exosome-like nanovesicles.
Findings
LC-MS identified 46 naturally active compounds in Cayratia japonica exosome-like nanovesicles.
The DeepSeek model filtered compounds with wound-healing and anti-inflammatory functions.
The framework integrates traditional analytical methods with AI for efficient data mining.
Abstract
Plant-derived exosome-like nanovesicles (P-ELNs) effectively deliver bioactive compounds due to their high biocompatibility and low immunogenicity. While liquid chromatography-mass spectrometry (LC-MS) profiles compounds in complex samples, its analysis of large datasets remains limited by traditional methods. Recent advances in large language models (LLMs) and domain-specific systems have enhanced Chinese biomedical data processing and cross-modal pharmaceutical research. This study aimed to create a multimodal framework of LC-MS combined with DeepSeek models for data mining of compounds with wound-healing properties from exosome-like nanovesicles derived from Cayratia japonica (CJ-ELNs). LC-MS identified compounds enriched in CJ (n=3) and CJ-ELNs (n=3), and then compounds specifically enriched in CJ-ELNs were filtered via a four-step filtering workflow. The CJ-ELNs-specific…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6| Key terms | Functions | Tools/databases |
|---|---|---|
| mzML | Standardized data storage | ProteoWizard |
| DeepSeek-Bio | Biological pathway association analysis | Deepseek 671B Model Network Edition |
| Morgan | Digital characterization of molecular structures | Chemoinformatics software packages |
| PubMedBERT | Literature feature extraction | PubMed.pro |
| Grad-CAM | Visualization of model decisions | Deep learning frameworks (eg, PyTorch) |
| ASR | automatic semantic recognition | The Great Prophecy Model of Human-Computer Interaction |
| Compound | Functions | Categories | Possibility |
|---|---|---|---|
| (-)-Epicatechin 3-O-gallate | Antioxidant, anti-inflammatory, anti-cancer, cardiovascular protection, glucose and lipid metabolism regulation. | Organic compound, antioxidant factor, anti-inflammatory factor, energy metabolism, phenolic factor | Wound healing: ×, tissue regeneration: ×, antioxidant: √, anti-cellular damage: ? |
| Rutin | Antioxidant and anti-inflammatory, maintaining vascular resilience, reducing vascular permeability and fragility, exhibiting certain antiviral and anticancer effects. | Flavonoids, antioxidant, anti-inflammatory | Wound healing: ×, tissue regeneration: ×, antioxidant: √, anti-cellular damage: ? |
| Caffeine | Central nervous system stimulants, enhance mental alertness, alleviate fatigue. | Organic compounds, alkaloids, energy metabolism | Wound healing: ×, tissue regeneration: ×, antioxidant: ×, anti-cellular damage: × |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExtracellular vesicles in disease · Pharmacological Effects of Natural Compounds · Silymarin and Mushroom Poisoning
Introduction
Plant-derived exosome-like nanovesicles (P-ELNs) contain abundant bioactive molecules, serving as novel carriers of natural products to mediate intercellular communication and mediate physiological processes [12]. P-ELNs are superior to conventional mammalian-derived exosomes, possessing unique advantages such as high biocompatibility, high skin permeability, low cytotoxicity and low immunogenicity [34]. Multiple in vitro and in vivo studies indicate that these P-ELNs possess intrinsic therapeutic activity, offering promise for disease treatment and enhancing human health [56]. Cayratia japonica, a traditional Chinese medicinal herb, is widely used for the treatment of traumatic injuries such as contusions and lacerations [7]. Recent clinical studies have confirmed that topical application of CJ ointment effectively alleviates local inflammation and promotes the repair and regeneration of damaged tissue, demonstrating favorable therapeutic outcomes in the management of postoperative infectious wounds around the anus [8]. However, research and application of exosome-like nanovesicles (ELNs) derived from CJ remain incomplete. Our research team successfully extracted and characterized a novel type of P-ELNs from the traditional Chinese medicinal herb Cayratia japonica, namely Cayratia japonica exosome-like nanovesicles (CJ-ELNs). They possess efficient delivery of bioactive compounds to wound sites, thus favoring tissue regeneration from infectious wound-related disorders. Bioactive constituents encapsulated within CJ-ELNs are dominant in wound healing. Consequently, the identification and characterization of bioactive compounds responsible for wound healing are of paramount significance.
Great strides have been made in the screening of active ingredients from natural products via omics techniques [9]. Liquid chromatography–mass spectrometry (LC-MS) has emerged as a powerful tool for profiling trace-level compounds in complex samples, although its performance in processing massive data is limited by traditional manual or rule-based analytical approaches [1011]. In recent years, large-scale pretrained language models (LLMs), such as ChatGPT, GPT-4, and domain-specific systems like DeepSeek, have significantly transformed the landscape of biomedical data analysis and knowledge discovery [1112]. These models exhibit powerful capabilities in natural language understanding, semantic reasoning, and prompt-based knowledge retrieval [13-15]. They are promising tools to assist omics analysis. In particular, DeepSeek models have been widely adopted for optimizing Chinese-language biomedical contexts, and supporting cross-modal tasks in pharmaceutical research, such as entity recognition, document summarization, and semantic ranking [1617].
In this study, we innovatively created a multimodal framework of LC-MS combined with DeepSeek models for data mining of compounds with wound-healing properties from CJ-ELNs. This work illustrates the potential of artificial intelligence (AI) as a computational engine in natural compound discovery and offers a scalable solution for mining multimodal biochemical data.
Methods
Preprocessing of LC-MS Data
Untargeted metabolomic profiling of CJ and CJ-ELNs was performed by LC-MS. A total of 6 samples (including 3 CJ samples and 3 CJ-ELNs samples) were analyzed using a ultra-high-performance liquid chromatography (UHPLC) system coupled to a Q Exactive HF-X mass spectrometer (Thermo Scientific). Chromatographic separation was performed on an HSS T3 column (maintained at 40°C) with a 12-minute linear gradient from 2% to 98% mobile phase B at a flow rate of 0.3 mL/min. Mass spectrometry (MS) data were acquired in both positive and negative electrospray ionization (ESI) mode (± ESI) using a data-dependent acquisition strategy (top 10 most intense ions). Raw data were first converted to the mzML format using ProteoWizard, followed by processing, using Compound Discoverer 3.3 (Thermo Fisher Scientific) for peak alignment (with maximum retention time shift of 0.5 min and mass tolerance of 10 ppm) and normalization (using the median of maximum peak areas). Compound identification was achieved by matching MS/MS spectra against the following databases: mzCloud, LipidMaps, KEGG, HMDB, and MassBank. The matching criteria were set to a mass tolerance of 10 ppm and a minimum match factor threshold of 10. A four-step filtering workflow was designed to quantitatively identify target compounds as follows (Figure 1).
Filtering of match confidence: compounds with spectral match scores ≥80 were retained [18];Filtering of unique compounds of CJ-ELNs: compounds identified in CJ and CJ-ELNs were compared with isolated compounds unique to CJ-ELNs;Filtering of biological relevance: candidate compounds were screened for associations with wound healing-related signaling pathways using the DeepSeek-Bio model;Semantic recognition and prompt engineering: final candidate molecules were refined through semantic analysis and prompt-based selection.
Common and unique compounds derived from CJ and CJ-ELNs were visualized in a Venn diagram, and a word cloud analysis was conducted via Python. Functions and tools, and databases of key terms used in this study are listed in Table 1.
A four-step filtering workflow. CJ-ELNs: Cayratia japonica exosome-like nanovesicle; LC-MS: liquid chromatography-mass spectrometry.
Construction of a Multimodal Framework of LC-MS Combined With DeepSeek Models
A multimodal framework of LC-MS combined with the DeepSeek-DF model was created, consisting of two major components of the input and output layers. The input layer integrated structural features of compounds (Morgan fingerprints), quantitative features (z score normalization), and literature-derived features (PubMedBERT embeddings). The core architecture was listed in Figure 2. Additionally, the output layer used multitask learning to simultaneously predict wound-healing activity via Sigmoid output and mechanism category via Softmax output.
The core architecture of the input layer.
Interpretability-Based Filtering
The Automated Semantic Recognition (ASR) module and prompt engineering techniques of DeepSeek-R1 32B, as well as web searching were used to interpret the potential biological functions of the screened candidate compound with an annotation of functional labels. A plausibility assessment was then performed based on predefined criteria, including antioxidation, anti-inflammation, anticellular damage, antiapoptosis, wound healing and tissue regeneration, and cell proliferation. Each compound was evaluated and categorized using the following scoring scheme: √ (confirmed), × (not supported), and ? (uncertain). Taking the metabolite (-)-Epicatechin 3-O-gallate as an example, its function, category and possibility in the involvement of wound healing, tissue regeneration, antioxidant, and anticellular damage were predicted via the multimodal framework (Table 2). Following this preliminary filtering, manual curation was conducted to eliminate compounds of nonplant origin and those with low abundances. Ultimately, a refined set of characteristic natural products from CJ-ELNs with potential wound-healing properties was selected.
Results
Acidic Compounds Are Enriched in CJ-ELNs
After conversion and normalization of the raw LC-MS data, a total of 829 and 2212 compounds were identified from CJ and CJ-ELNs. A Venn diagram visualized 1881 specific compounds in CJ-ELNs (Figure 3). “Acid,” as the most frequent term across all entries of metabolite names, was detected by a word cloud analysis (Multimedia Appendix 1). It suggested that acidic compounds were highly enriched in CJ-ELNs.
Enrichment of acidic compounds in CJ-ELNs. (A) A Venn diagram visualizing an intersection of compounds identified from both CJ and CJ-ELNs and unique compounds in CJ-ELNs. CJ: Cayratia japonica; CJ-ELNs: Cayratia japonica exosome-like nanovesicle.
Rapid and Accurate Data Mining of Compounds in CJ-ELNs With Functional Properties
A total of 1881 candidate compounds enriched in CJ-ELNs were functionally annotated and classified using the self-designed multimodal framework of LC-MS combined with DeepSeek models. They were categorized into 20 distinct classes, including organic compounds, alkaloids, amino acids, biomolecules, organic acids, antioxidants, anti-inflammatory agents, energy metabolism-related molecules, phenolics, cytoprotective agents, alcohols, and others. Organic compounds were the leading category of compounds enriched in CJ-ELNs (Figure 4, Multimedia Appendix 2). Functionally, 43.33% (n=39) of compounds enriched in CJ-ELNs possessed the antioxidant property. With the assistance of DeepSeek, we specifically screened compounds enriched in CJ-ELNs with targeted functions of antioxidation, anti-inflammation, anticellular damage, antiapoptosis, wound healing and tissue regeneration, and cell proliferation.
Rapid and accurate data mining of compounds in CJ-ELNs with functional properties. Top 20 classifications of compounds enriched in CJ-ELNs. CJ-ELN: Cayratia japonica exosome-like nanovesicle.
Bioactive Compounds of CJ-ELNs Responsible for Wound Healing and Tissue Regeneration
We estimated the overall expression levels of compounds across the six target functions derived from the DeepSeek model within this multimodal framework, visualizing the results in radar chart format after log2-transformation. (Figure 5). Notably, compounds with the antioxidant function possessed the highest expression levels, proving the antioxidant mechanism of CJ-ELNs in wound repair. Finally, a secondary filtering of compounds with targeted functions was conducted. We manually excluded nonplant–derived compounds, including those of animal origin, synthetic chemicals, and other nonbotanical sources. In addition, compounds with low expression levels in CJ-ELNs were also removed. As a result, a total of 46 naturally active compounds derived from CJ-ELNs with targeted functions were identified (Figure 6 and Multimedia Appendix 3). Citric acid was the most abundant compound with the targeted functions, which was consistent with the finding from the word cloud analysis.
Radar plots visualizing bioactive compounds of Cayratia japonica exosome-like nanovesicles with targeted functions.
Expression levels (log2-transformed) of naturally active compounds derived from Cayratia japonica exosome-like nanovesicle identified by an integration of liquid chromatography-mass spectrometry and DeepSeek models.
Discussion
Principal Findings
This study innovatively integrated DeepSeek models with LC-MS to successfully predict the major natural products of CJ-ELNs responsible for wound healing. DeepSeek’s ASR semantic recognition and prompt engineering worked together to generate initial classification labels. Moreover, an automatic assessment effectively, rapidly, and accurately achieved the goal of data mining of specific compounds for targeted functions.
AI techniques, particularly LLMs, have become an unstoppable force for reshaping medical research [1920]. Traditionally, LC-MS is a powerful analytical technique to identify and quantify active ingredients in traditional Chinese medicine (TCM) herbs. However, a rapid and accurate recognition of compounds with targeted functions, and a quantitative analysis of trace concentrations in complicated samples can be challenging [21]. We expected that an integration of LC-MS and LLMs would benefit TCM research, including the acceleration of active ingredient screen, precise targeting of interested compounds for certain diseases, and anchoring the promising candidates for developing new drugs. DeepSeek is an intelligent system based on a large-scale pre-trained language model, exhibiting strong capabilities in text understanding, knowledge reasoning, and cross-modal collaborative analysis, particularly excelling in processing information within Chinese-language contexts [2223]. It enables rapid processing and analyzing massive volumes of both unstructured and structured data, thus digging biological insights out of complex omics datasets [2425].
In the present study, we first created a four-step filtering workflow and quantitatively identified target compounds from CJ-ELNs by LC-MS. The cloud word analysis emphasized the term of acid among screened compounds enriched in CJ-ELNs. Acidic compounds derived from traditional Chinese herbals are established for the role of clearing heat and detoxifying [26]. Numerous studies have reported that acidic compounds in plants exert antioxidant, antibacterial, and anti-inflammatory effects through mechanisms such as scavenging free radicals, alleviating oxidative stress, modulating inflammatory factors, stimulating fibroblast proliferation, promoting collagen deposition, enhancing epithelialization, and inducing angiogenesis [2728]. To achieve a precise data mining of compounds with relevant functions, DeepSeek models lent a hand that specifically screened compounds in CJ-ELNs with targeted functions of antioxidation, anti-inflammation, anticellular damage, antiapoptosis, wound healing and tissue regeneration, and cell proliferation. Finally, naturally active compounds in CJ-ELNs were resurfaced for their promising potentials in wound repair. For example, studies have shown that baicalin accelerates the wound healing process by downregulating the expression of pro-inflammatory cytokines (IL-6 and IL-1β) while upregulating the anti-inflammatory factor IL-10, and by promoting the secretion of various growth factors (VEGF, FGF-2, PDGF-β, and CTGF) [29]. The combination of LC-MS with DeepSeek paves the way to further analyses of therapeutic targets from traditional Chinese herbs for wound healing and tissue regeneration [3031].
Limitations in this study should be noted. Firstly, bioactive compounds derived from CJ-ELNs were mined via LC-MS and a single LLM, namely, DeepSeek-R1. Other cutting-edge LLMs such as Claude, GPT-4 and Liama [32] can be further analyzed for the assistance of LC-MS in identifying interested compounds. Secondly, the 46 naturally active compounds derived from CJ-ELNs with targeted functions should be validated in in vivo and in vitro experiments. Lastly, the workflow we have established requires further validation on independent datasets. We shall address the aforementioned issues in subsequent work, including evaluating the efficacy of compounds through cell migration and transdermal tissue compatibility assays, verifying their efficacy via macroscopic imaging and H&E staining following animal wound modelling interventions, and validating potential pathways involved through Western blot and immunohistochemical analysis.
Conclusion
We innovatively designed a multimodal framework of LC-MS combined with DeepSeek models that rapidly and accurately identify naturally active compounds from CJ-ELNs. This AI-powered system innovatively integrates the traditional analytical technique with modern large language models, showing a huge potential in modern medicine and TCM research.
Supplementary material
10.2196/80539Multimedia Appendix 1A word cloud of common compounds identified by liquid chromatography-mass spectrometry.
10.2196/80539Multimedia Appendix 2Distribution of the classifications of compounds enriched in CJ-ELNs, distribution of functional compounds enriched in CJ-ELNs with targeted functions of wound healing and tissue regeneration, and distribution of compounds enriched in CJ-ELNs with all functional categories.
10.2196/80539Multimedia Appendix 3Function of 46 compounds.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Subha D Harshnii K Madhikiruba KG Nandhini M Tamilselvi KS Plant derived exosome- like nanovesicles: an updated overview Plant Nano Biology Feb 20233100022 doi 10.1016/j.plana.2022.100022 · doi ↗
- 2Mu N Li J Zeng L et al Plant-derived exosome-like nanovesicles: current progress and prospects Int J Nanomedicine 20231849875009 doi 10.2147/IJN.S 420748 Medline 37693885 PMC 10492547 · doi ↗ · pubmed ↗
- 3Dad HA Gu TW Zhu AQ Huang LQ Peng LH Plant exosome-like nanovesicles: emerging therapeutics and drug delivery nanoplatforms Mol Ther 0120212911331 doi 10.1016/j.ymthe.2020.11.03033278566 PMC 7791080 · doi ↗ · pubmed ↗
- 4Di Gioia S Hossain MN Conese M Biological properties and therapeutic effects of plant-derived nanovesicles Open Med Nov 21202015110961122 doi 10.1515/med-2020-0160 PMC 771864433336066 · doi ↗ · pubmed ↗
- 5Lian MQ Chng WH Liang J et al Plant-derived extracellular vesicles: recent advancements and current challenges on their use for biomedical applications J Extracell Vesicles Dec 20221112 e 12283 doi 10.1002/jev 2.12283 Medline 36519808 PMC 9753580 · doi ↗ · pubmed ↗
- 6Karamanidou T Tsouknidas A Plant-derived extracellular vesicles as therapeutic nanocarriers Int J Mol Sci Dec 242021231 Tepublishdoi 10.3390/ijms 23010191 Medline 35008617 PMC 8745116 · doi ↗ · pubmed ↗
- 7Sun J Zhao P Ding X et al Cayratia japonica prevents ulcerative colitis by promoting M 2 macrophage polarization through blocking the TLR 4/MAPK/NF-κB pathway Mediators Inflamm 202220221108569 doi 10.1155/2022/1108569 Medline 36619207 PMC 9822765 · doi ↗ · pubmed ↗
- 8Zhao X Dai R Wang J et al Analysis of the permeable and retainable components of Cayratia japonica ointment through intact or broken skin after topical application by UPLC-Q-TOF-MS/MS combined with in vitro transdermal assay J Pharm Biomed Anal 01202024238115853 doi 10.1016/j.jpba.2023.115853 Medline 37976992 · doi ↗ · pubmed ↗
