Leveraging Dissolved Organic Matter Collections as a Natural Chemical Library to Link Molecular Traits with Cellular Morphological Responses
Xin Zhang, Mourad Harir, Joel Schick, Marianna Lucio, E. Michael Perdue, Philippe Schmitt-Kopplin

TL;DR
This study explores how dissolved organic matter affects cell shape, linking chemical traits to biological effects using advanced analysis methods.
Contribution
The novel integration of FT-ICR MS and Cell Painting assays to connect DOM molecular traits with cellular morphological responses.
Findings
Humic acids show stronger aliphatic and aromatic traits, while fulvic acids and NOM are more oxidized.
Hydrophobic humic substances induce significant morphological changes in cells.
Lipid-like and nitrogen-rich aromatic compounds are likely responsible for observed cellular effects.
Abstract
Dissolved organic matter (DOM) exhibits a highly complex molecular composition and maintains ecosystem stability, acting as a crucial interface between biotic and abiotic processes. Although DOM’s molecular complexity and biological effects are widely studied, most investigations use targeted bioassays, examining specific responses and linking molecular features only to predefined biological outcomes when assessing potential bioactive components. Here, we analyzed International Humic Substances Society (IHSS) reference standards of natural organic matter (NOM) and humic fractions, including humic acids (HAs) and fulvic acids (FAs), using ultrahigh resolution Fourier- transform ion cyclotron resonance mass spectrometry (FT-ICR MS) alongside the Cell Painting (CP) assay, a multiplexed, image-based morphological profiling method. The chemical composition of IHSS samples was influenced by…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
1
2
3
4- —China Scholarship Council10.13039/501100004543
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiotin and Related Studies · Cell Image Analysis Techniques · Molecular Sensors and Ion Detection
Introduction
1
Organic matter (OM) is one of the most abundant and reactive reservoirs in global element cycles, linking living and mineral components and present through all earth environments, from atmosphere, waters, soils to deep geosphere,? and has been involved in the evolution of life since planetary formation.? Its variable behavior across these compartments influences global carbon cycling and climate feedbacks, modulated by biological and geochemical or physicochemical stressors. ?−? ? Despite OM’s global importance in climate regulation and Earth’s homeostasis, its chemical composition, behavior toward biology and transformation remain partially understood. ?,? For practical purpose, OM is operationally categorized into dissolved organic matter (DOM) and particulate organic matter (POM) based on filtration methods rather than intrinsic chemical differences.? Among them, DOM has attracted more attention due to its high mobility and reactivity,? which enable it to play a central role in critical ecosystem services such as nutrient cycling, carbon sequestration, pollutant transport, soil fertility, microbial dynamics, and water quality. ?,? Differential fractions within DOM are responsible for key properties and processes, making their chemical characterization fundamental for understanding DOM’s complex composition and biogeochemical functions. ?,? To support research in this area, the International Humic Substances Society (IHSS) has developed a library of standards and reference materials, sourced from representative environments and extracted using consistent protocols to promote comparability and reproducibility in DOM studies. ?,?
DOM originates from the combined effects of biotic and abiotic degradation and transformation of plant and microbial residues across diverse environments. ?−? ? ? Abiotic processes, such as photochemical and chemical oxidation, can occur widely in natural environments and play a major role in the formation and transformation of oxidized natural organic matter (NOM), alongside biological processes. These diverse environmental origins and varying degrees of decomposition result in a complex and heterogeneous composition of DOM, which limits its direct characterization.? Moreover, DOM occurs at low concentrations in natural systems, making extraction and concentration essential for molecular-level analysis.? In aquatic settings, reverse osmosis (RO) techniques have enabled the extraction of natural organic matter (NOM) under mild conditions, typically recovering 80–90% of the total organic carbon.? The RO-extracted materials reflect the functional and structural complexity of DOM and are critical for investigating its ecological behavior.? Also, solid-phase extraction (SPE) using styrene-divinylbenzene polymer (PPL) cartridges has become widely used in recent years due to its simplicity, efficiency, and suitability for field extractions in a broad range of DOM.? To ensure consistency and comparability, IHSS has established standard alkaline extraction protocols for diverse materials. Following these protocols, DOM can be fractionated in standard operation procedures based on solubility in acids and bases into (i) humic acids (HAs), which precipitate at pH lower 2, (ii) fulvic acids (FAs), which remain soluble at all pH levels and (iii) humins, the highly molecular insoluble and hydrophobic organo-mineral fraction. These operational categories effectively represent the nature of DOM, facilitating the study of DOM’s diverse reactivity and environmental functions over decades.?
The complex and heterogeneous composition of DOM fractions also present a significant challenge for analytical techniques. Although conventional chemical analyses yield bulk compositional data, they often do not capture the molecular features necessary to connect structure with ecological or biological function. Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR MS) addresses this limitation by offering unparalleled mass accuracy, resolution, and dynamic range.? It enables nontargeted molecular characterization of thousands of compounds from minimal sample quantities, providing unprecedented insights into DOM composition, reactivity, and transformation pathways.? Such efforts are crucial for identifying consistent patterns in DOM behavior, for establishing molecular indicators of ecosystem processes, and for refining predictive models of OM dynamics under global change. Although FT-ICR MS has been widely applied to the chemical characterization of DOM fractions, comparative studies of DOM fractions from different sources remain limited to case studies, such as single-watershed comparisons, pre- and post-disturbance analyses, depth-resolved soil or peat profiles, or evaluations of extraction-method effects within the same environment.
Meanwhile, DOM fractions are attracting growing attention for their bioactivity. ?,? This interest arises from both the need to understand their ecological functions and from their growing use in humic-based products for green agriculture, as well as in health-related and cosmetic applications. ?,? While DOM’s biogeochemical relevance is well recognized, its potential in biological contexts remains underexplored. Recent studies have revealed that its complex molecular composition can exhibit diverse biological activities, including antiviral, anti-inflammatory, and antibacterial effects, highlighting its potential for medicinal and health-related applications. ?−? ? The polyfunctional and highly complex mixture nature of DOM, however, presents challenges for characterizing its biological effects. Current studies often rely on target-based assays, which may miss off-target effects or provide limited insight into mechanisms of action (MOA).? To overcome these limitations, phenotypic assays have emerged as powerful alternatives. Rather than focusing on predefined targets, they assess the overall effects of treatments on biological systems, allowing target-agnostic activity profiling.? Among these, the Cell Painting (CP) stands out for its high-throughput capability, cost-effectiveness, and ability to generate rich morphological data.? By applying fluorescent dyes to visualize multiple cellular structures, CP provides a nontargeted readout of cellular responses and can detect polypharmacological effects. ?,? It has been successfully applied to assess the bioactivity of various chemicals, gene perturbations, and natural products, ?−? ? making it particularly suited for the complex and multifunctional nature of DOM.
In this study, FT-ICR MS and CP were integrated to perform detailed chemical and morphological profiling of 33 diverse IHSS standard and reference samples, which represent a broad range of DOM fractions from diverse environments. While traditional target-based assays are limited in capturing the multidimensional cellular responses to complex DOM mixtures, our integrated CP-FT-ICR MS approach enables high-content morphological profiling directly linked to molecular composition, providing a more comprehensive understanding of potential bioactive components. This allowed for a systematic evaluation of how origins and extraction methods influence the molecular composition and morphological effects of DOM. More importantly, this study presents a statistical framework for identifying potential bioactive components within DOM fractions by linking chemical profiling with multiparametric bioactivity assay results. Overall, this study bridges the molecular and biological complexity of DOM, offering a novel approach to understanding its functional roles in both environmental and health-related contexts.
Materials and Methods
2
Sample Description
2.1
All samples used in this study were obtained from the IHSS, with source material descriptions available on its homepage (https://humic-substances.org). As summarized in Table S1 and Figure S1, standard materials were collected from the Elliott Series, Pahokee Series, Bowman County (Leonardite), and the Suwannee River, in accordance with the strict IHSS criteria. Reference materials from other sources were prepared following slightly less stringent standards. All samples were dissolved in dimethyl sulfoxide (DMSO) to prepare 10,000 ppm stock solutions. The solutions were thoroughly vortexed, sonicated, and centrifuged, and the resulting stock solutions were stored at −20 °C for subsequent experiments.
Molecular Characterization of IHSS Samples
2.2
The molecular compositions of IHSS samples were analyzed using a high-field Fourier-transform ion cyclotron resonance mass spectrometer equipped with electrospray ionization (ESI) operated in negative ion mode. Thousands of signals were translated into elemental formulas and visualized using van Krevelen-type diagrams (H/C vs O/C) and related plots (H/C vs m/z), with each formula represented as a dot. Dot size indicates relative abundance, providing a visual overview of oxygenation (O/C) and saturation (H/C) across molecular classes (CHO, CHNO, CHOS, and CHNOS). Detailed instrumental parameters, calibration procedures, and data processing workflows are provided in Text S1.
Cell Line and Cell Culture
2.3
U2OS cells were cultured in Dulbecco’s Modified Eagle Medium (DMEM; Gibco, cat. 31885-023) supplemented with 10% fetal bovine serum (Gibco, cat. 10270-106), 1% Penicillin–Streptomycin (Gibco, cat. 15140-122), and 1% nonessential amino acids (NEAA; Gibco, cat. 11140-035). Cells were maintained in the incubator of 95% humidity and 5% CO_2_ atmosphere and were cultured up to passage 8.
Cell Painting
2.4
The CP was performed following the originally published protocol.? U2OS cells were seeded into a Poly-D-Lysine -coated 384-well plate (PerkinElmer, cat-781091) at a density of ∼1000 cells per 50 μL per well and incubated under standard culture conditions for 24 h. Cells were then exposed to various samples. Briefly, stock solutions of each sample were diluted 100-fold in culture medium and used to replace the existing medium in each well, ensuring a final DMSO concentration of 1% across all treatments. Each sample was measured in six replicates to ensure reproducibility.
After 24 h of incubation, live cells were first labeled with Mito Tracker^TM^ Deep Red, followed by fixation with 4% paraformaldehyde (PFA) and permeabilization. Cells were then stained with five additional fluorescent dyes for 30 min. In total, eight cellular components were labeled in this assay. Detailed information on the fluorescent dyes is provided in Table S2.
Fluorescent images were collected from five channels using the CellInsight^TM^ CX7 high-content analysis platform (Thermo Fisher Scientific), equipped with a 20× objective (Olympus^TM^, 0.4 NA). Image analysis was conducted using Compartmental Analysis BioApplication and Morphology Explorer (Cellomics, Thermo Fisher Scientific) to extract morphological features from individual cells. These features describe the number, fluorescence intensity, area, and shape of different stained cellular compartments, and collectively constitute a quantitative representation of cellular morphology. Well-level morphological features were then generated by aggregating these single-cell measurements. For each well, the mean of each single-cell feature was calculated, and additional descriptors including the variance and the correlations among these mean values were calculated to capture population-level changes in morphology. In total, 272 well-level morphological features were obtained and subsequently used for multivariate analysis and data integration analysis in this study. Detailed information about the name, analysis modules, imaging channels, and feature categories of these features are provided in Table S3. Cell-level features were retained separately for evaluating single-cell distributions. For both well-level and cell-level measurements, feature values were normalized to the corresponding vehicle controls (cells treated with 1% DMSO on the same measurement plate) using the robust MAD (rMAD) method to reduce technical variability. ?,? Specifically, the median of the feature values from the vehicle controls was subtracted from each treatment value, and the result was then divided by the median absolute deviation (MAD) of the control group. The normalized morphological features were subsequently expressed as z-scores and used for downstream analysis.
To assess the autofluorescence of samples and their potential interactions with CP dyes, a parallel experiment was conducted under CP conditions but without cells. Fluorescence was measured for individual samples, dyes, and their mixtures at four concentrations (1, 10, 50, and 100 ppm). Readouts were obtained using an EnVision 2104 Multilabel Plate Reader (PerkinElmer), applying the same wavelengths used in the CP. Sample-derived autofluorescence was evaluated by subtracting the fluorescence of each sample to that of phosphate-buffered saline (PBS). Differences in fluorescence between dye-sample mixtures and dyes alone were used to evaluate sample-induced effects on dye fluorescence.
Statistical Analyses and Data Visualization
2.5
All data processing, statistical analyses, and data visualization were carried out using R (version 4.4.1) and Python (version 3.9.13). Hierarchical clustering analysis (HCA) was performed with the Ward.D2 method and Euclidean distance using the ComplexHeatmap package (version 2.20.0). Principal component analysis (PCA) was conducted with FactoMineR (version 2.6) and Factoextra (version 1.0.7).
Sparse Partial Least Square (sPLS) was performed using mixOmics (version 6.28.0) in regression mode to explore the association between the FT-ICR MS and CP data sets. For data preprocessing, CP data were summarized by calculating the median well-level morphological profiles of six replicates for each sample. FT-ICR MS peak intensities were first normalized to the total intensity of each sample, then scaled to unit variance across samples for each peak. The resulting FT-ICR MS and CP data sets were merged and subjected to z-score normalization by sample. The sPLS model was optimized by selecting the number of components that maximized the Q2.total through 10-fold cross-validation repeated 5 times. A Q2.total threshold of 0.0975 was used as a cutoff to determine the number of components retained in the model. Components with Q^2^.total values above this threshold were considered to provide nonrandom predictive contribution under cross-validation, whereas components below this threshold were excluded from interpretation.? Feature selection was performed using lasso penalization on the loading vectors, selecting the subset of variables that maximized the cross-validated correlation between the two data sets. Among the selected FT-ICR MS features, a subset associated with strong morphological effects was further analyzed using mass network analysis in Gephi (version 0.10.1). This network was constructed based on exact mass differences to reveal potential molecular connections and transformation pathways.
Results
3
Profiling the Molecular Composition of IHSS
Samples
3.1
Comprehensive molecular profiling of 33 IHSS samples was conducted using FT-ICR MS, with materials collected from diverse environments including soil, peat, lignite (coal), and inland freshwater. The samples were categorized into terrestrial (HAs and FAs) and aquatic (HAs, FAs, and NOM) groups, enabling an in-depth assessment of how environmental origins and extraction protocols influence the molecular composition of DOM fractions (Table S1). A unified solvent system, dissolving samples in DMSO and diluting with methanol, was used to facilitate consistent comparisons across all DOM fractions. This approach effectively addressed solubility challenges, particularly the limited resolubility of HAs postextraction.? The molecular compositions were grouped by the number of oxygen (O), nitrogen (N), and sulfur (S) atoms to explore heteroatom compositions, revealing the molecular diversity across environments. The relative abundance of these heteroatom classes varied across samples (Figure S2), with terrestrial HAs and Pony Lake FA (PLFA) containing fewer oxygen atoms, indicating lower oxidation. Elevated levels of N- and S-containing compounds were observed in terrestrial HAs, terrestrial FAs, and PLFA, suggesting that environmental factors dominate the variability in DOM’s N- and S-content. The chemical composition of the IHSS samples was subsequently examined by classifying assigned formulas based on their elemental composition and predefined compound classes (Figurea). The elemental composition demonstrated a strong source dependence, with terrestrially derived samples exhibiting a notable presence of N-containing constituents in all fractions, while aquatic samples were characterized by CHO formulas, except for PLFA, which was enriched in N- and S-containing compounds. Compound class distribution was influenced by both environmental sources and fractionation schemes. Highly unsaturated and phenolic compounds dominated NOM, FAs, and aquatic HAs, whereas soil HAs were enriched in aromatic and aliphatic compounds, and PLFA showed a relatively higher proportion of aliphatic compounds.
Molecular characterization of International Humic Substances Society (IHSS) samples revealed by Fourier-Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR MS). (a) Stacked bar plots showing the proportions of assigned formulas categorized by elemental composition (left panel: CHO, CHNO, CHOS, CHNOS) and by compound class (right panel), including highly unsaturated and phenolic compounds (AImod ≤ 0.5 and H/C < 1.5), aromatic compounds (0.5 < AImod ≤ 0.67), polycyclic aromatic compounds (AImod > 0.67), and aliphatic compounds (AImod ≤ 0.5 and H/C ≥ 1.5). (b) Intensity-weighted average hydrogen-to-carbon (H/C) versus oxygen-to-carbon (O/C) ratios for each sample. (c) Intensity-weighted double bond equivalent (DBE) versus H/C ratios for each sample.
Van Krevelen diagrams (Figureb) provided further insights into molecular compositions, with terrestrial HAs exhibiting a strong presence of aromatic and condensed compounds, reflected by low hydrogen-to-carbon (H/C) and oxygen-to-carbon (O/C) ratios, characteristic of their stable, aromatic nature. In contrast, aquatic fractions, Waskish Peat HA and Waskish Peat FA (WPHA and WPFA), and terrestrial FAs were more oxygenated (oxidized) and aliphatic, showing higher H/C and O/C ratios. Among these, terrestrial FAs had the highest oxygenation state, which may reflect the introduction of oxygen-rich functional groups during microbial alteration and partial photo-oxidation. ?,? The comparison of average double bond equivalents (DBE) against H/C ratios (Figurec) showed a clear shift from aromatic terrestrial HAs to more aliphatic aquatic FAs, reflecting differences in environmental sources and fractionation methods. WPHA and WPFA aligned more closely with aquatic materials, consistent with their origin in nutrient-poor, cold bog peat,? in contrast to the agricultural peat environment of Pahokee Peat.? These patterns support previous studies, ?,? confirming the aromatic nature of HAs and the oxygen-rich character of FAs and NOM.
Multivariate Analysis for Identifying Molecular
Patterns of IHSS Samples
3.2
Hierarchical clustering analysis (HCA) was applied to assess the compositional patterns of IHSS samples based on their assigned molecular compositions. The resulting dendrogram (Figure S3a) identified four clusters. Cluster 1 contained all terrestrial HAs except WPHA. Cluster 2 consisted exclusively of PLFA. Cluster 3 grouped all terrestrial FAs except WPFA. Cluster 4 included all aquatic fractions together with WPHA and WPFA. A detailed list of the samples within each cluster is provided in Table S4. To reinforce these observations, we next performed principal component analysis (PCA). The PCA score plots (Figure S3b) showed strong agreement with the HCA results, providing consistent evidence for the sample clusters identified in HCA. Principal component 1 (PC1) effectively separated terrestrial HAs from other sample types, while PC2 further distinguished terrestrial FAs from aquatic fractions as well as WPHA and WPFA. PC3 highlighted the unique molecular signature of PLFA. Interestingly, the variance along PC1 and PC2 indicated partial molecular similarities between PLFA and terrestrial FAs. Since HCA and PCA revealed consistent cluster patterns, PCA loadings were used to identify the molecular features driving these clusters. Samples were divided into four groups based on their positions in the PCA score space, and for each group, molecular formulas with loading directions corresponding to the quadrant of the group were extracted. For instance, formulas with negative PC1 loadings were selected for terrestrial HAs in the negative PC1 region, and formulas with positive loadings on both PC1 and PC3 were selected for PLFA in the quadrant with positive PC1 and PC3 scores. These group-specific molecular signatures were then visualized using van Krevelen and mass-edited H/C ratio diagrams (Figure), revealing distinct compositional patterns that contribute to the observed clustering.
Van Krevelen diagrams showing hydrogen-to-carbon versus oxygen-to-carbon ratios, alongside mass-edited hydrogen-to-carbon ratio plots displaying hydrogen-to-carbon ratios against the molecular weight of compositions contributing to sample separation in the Principal Component Analysis: (a) Terrestrial humic acids (HAs), (b) Pony Lake fulvic acid (FA), (c) Terrestrial fulvic acids except Waskish Peat fulvic acid (FAs), and (d) Aquatic fractions, Waskish Peat humic acid and Waskish Peat fulvic acid (Waskish Peat HA & FA). Molecular classes are color-coded as follows: CHO (blue), CHNO (orange), CHOS (green), and CHNOS (red). Bubble sizes are proportional to mass peak intensity. The general classification of major molecular classes in the van Krevelen diagrams is provided in Figure S4, showing typical regions for fatty acids, carbohydrates, lignins, tannins, and condensed aromatics based on their H/C and O/C ratios.
Based on these PCA-derived molecular signatures, compositional differences emerged among the sample clusters. Terrestrial HAs, terrestrial FAs, and PLFA showed higher levels of N- and/or S-containing compounds compared to aquatic samples (Figurea–c), consistent with previous studies on terrestrial organic matter. ?−? ? Terrestrial HAs were characterized by aliphatic compounds, lignin-like compounds, and black carbon (BC)-type condensed aromatics (Figurea), typical of humic substances observed in soils.? In contrast, PLFA exhibited a higher abundance of reduced molecular compositions, with molecular weights primarily below m/z 600 (Figureb). Terrestrial FAs, aquatic fractions, WPHA and WPFA were enriched in oxygen-rich, tannin-like compounds (Figurec,d). Within these samples, terrestrial FAs exhibited a markedly higher proportion of highly oxygenated tannins and N- and S-containing molecules, suggesting microbial transformations,? whereas aquatic samples contained these compounds at lower relative abundance (Figure S5).
Cell Painting for Morphological Profiling
of IHSS Samples
3.3
The CP assay was employed to evaluate the morphological effects of IHSS samples. Given the complexity of these samples, we first assessed their autofluorescence and potential interactions between the samples and fluorescent dyes to evaluate the reliability of morphological readouts. As shown in Figure S6a, varying levels of autofluorescence were observed, which are consistent with previous studies.? HAs demonstrated stronger fluorescence quenching than FAs and NOM across multiple dyes and concentrations (Figure S6b), agreeing with earlier findings. ?−? ? The quenching effect was more pronounced at lower concentrations, likely due to the colloidal behavior of humic substances, where aggregation at higher concentrations limits their interaction with dyes, while dispersion at lower concentrations enhances it.?
Based on this evaluation, CP was performed on all IHSS samples at a final concentration of 100 ppm. The workflow of the CP assay is illustrated in Figure S7. Fluorescent images were visually examined to ensure accurate identification of cellular compartments, and morphological features strongly influenced by fluorescence intensity were excluded from further analysis. The resulting well-level morphological profiles were analyzed using PCA, revealing a progressive separation from vehicle controls, with samples ordered from NOM to FAs to HAs and from aquatic to terrestrial sources (Figurea), indicating increasing morphological activity. Consistently, the Global Euclidean Distance metric, which quantifies deviations from controls, differentiated IHSS samples into those with high CP activity (more morphologically active) and those with low CP activity (less morphologically active), revealing that terrestrial HAs exhibited greater morphological activity (Figure S8). This pattern is consistent with a recent study of HAs exhibiting greater antiviral activity than FAs,? challenging the conventional assumption that FAs are more reactive due to their higher oxygen content.? Notably, despite its high heteroatom content, PLFA did not show significant differences in morphological effects compared to other aquatic FAs. Further HCA of the median well-level morphological profiles revealed a consistent reduction in nuclear area and diameter in samples with higher CP activity (Figure S9). Single-cell distribution analysis supported these findings, showing minimal changes for treatments with low CP activity and a clear shift toward smaller nuclear areas and diameters in treatments with high CP activity, which are consistent with apoptosis-like or genotoxic responses, as reductions in nuclear size often reflect chromatin condensation and early cytotoxic stress (Figureb). Representative fluorescent images demonstrated nuclear shrinkage in response to terrestrial HA exposure (Figure S10). Although cell shrinkage was observed in different treatments, it was not initially recognized due to its common occurrence (see cell area panel in Figureb). Both nuclear and cell shrinkage are morphological hallmarks of apoptosis, and are often linked to DNA damage.? The results suggest that terrestrial HAs may possess genotoxic potential, supporting previous studies that link HAs to apoptosis. ?−? ?
Morphological profiling of IHSS samples using the Cell Painting (CP). (a) Principal component analysis of well-level morphological profiles from U2OS cells exposed to IHSS samples. Both panels display the same score plot, colored by fraction types (left) and source materials (right). Vehicle controls (cells treated with 1% DMSO) are shown in gray. (b) Single-cell distributions showing differences in nuclear area, nuclear diameter, and cell area among controls, high-activity, and low-activity samples. For both well-level and cell-level measurements, morphological features were centered by subtracting the median of the control population and scaled by dividing by the corresponding median absolute deviation (MAD).
Integrating Chemical and Morphological Profiles
to Identify Active Components in IHSS Samples
3.4
We employed a multivariate integration approach to identify the morphologically active components in IHSS samples by linking their chemical and morphological profiles. Sparse partial least-squares (sPLS) was selected for its ability to highlight the causal relationship between predictor variables and response variables.? Model performance was assessed using the cross-validated Q2.total criterion based on repeated 10-fold cross-validation (5 repeats). A Q2.total threshold of 0.0975 was used to determine the number of components to retain.? Only the first component exceeded this cutoff (Q2.total = 0.1333), whereas all subsequent components fell below the threshold. Therefore, the sPLS model was fitted using only the first component. Feature selection retained 2800 FT-ICR MS chemical features (X) and 80 CP morphological profiles (Y). On the first component, the fitted sPLS model explained 22% of the variance in the chemical features (X) and 33% of the variance in the morphological profiles (Y). Samples were projected onto the latent space defined by the integrated chemical and morphological profiles (Figurea). The first latent component (XY-variate
- effectively distinguished terrestrial humic acids (HAs) from other IHSS fractions.
Integration of chemical and morphological profiles to identify morphologically active components in IHSS samples. (a) Projection of IHSS samples onto the first latent component (XY-variate 1) defined by integrated chemical (X: FT-ICR MS) and morphological (Y: Cell Painting) profiles. Samples projecting onto the negative axis of XY-variate 1 are labeled in blue, whereas samples projecting onto the positive axis are labeled in red. (b) Projection of FT-ICR MS chemical features (X) selected by the sPLS model onto XY-variate 1. Chemical features with negative loadings on XY-variate 1 are colored in blue, while those with positive loadings are colored in red, illustrating their contributions to the observed sample separation. (c) Van Krevelen (H/C vs O/C) and mass-edited H/C vs m/z diagrams depicting the chemical space of sPLS-selected features projected onto the negative axis of XY-variate 1. Two distinct compositional regions are observed, corresponding to lipid-like compounds and black-carbon-type nitrogen-rich aromatic compounds. Molecular classes are color-coded as CHO (blue), CHNO (orange), CHOS (green), and CHNOS (red), with bubble sizes proportional to mass spectral intensity.
FT-ICR MS chemical features selected by the sPLS model were projected onto XY-variate 1 to visualize their contributions to the observed sample separation (Figureb). We focused on the chemical features projected onto the negative axis of XY-variate 1 because these features were enriched in samples projecting in the same region of the latent space, corresponding to terrestrial humic acids, and were associated with morphological changes. The sPLS-selected molecular compositions projected onto the negative axis of XY-variate 1 clustered into two distinct regions on the van Krevelen diagram, corresponding to lipid-like and BC-type condensed aromatic compositions (Figurec). These molecular compositions exhibited moderate molecular weights and were enriched in N-containing species. To further explore the characteristics of these molecular compositions, we applied mass difference network analysis and Kendrick mass defect (KMD) analysis. The mass difference network revealed relationships based on exact mass differences, offering insights into transformation pathways.? Meanwhile, KMD analysis detects homologous series differing by defined chemical functional groups and facilitates the identification of compound classes based on their repeating units. A mass difference network was constructed by matching the exact mass differences among all selected molecules with a predefined list of 40 mass differences corresponding to common functional groups and molecular transformations. Table S5 lists these transformations, showing each formula change and its chemical interpretation. The table includes common modifications observed in natural organic matter, such as alkylation, carbon addition, acylation, carbonylation, oxidation, hydroxylation, carboxylation, methylation, amino-related transformations, and diazomethylation, capturing both core structural modifications and functional group changes that drive molecular diversity and reactivity patterns. Analysis of the network (Figure S11a) revealed that the 20 most frequent mass differences were dominated by CH_2_–derived alkyl units (CH_2_, C_2_H_4_) and redox-related oxygen modifications (CO, O) (Figure S11b), indicating that alkylation and redox processes are the primary drivers of molecular transformations within both subnetworks. KMD analysis further revealed a strong presence of KMD (CH_2_) and KMD (O) homologous series in subnetwork 1, and primarily KMD (CH_2_) series in subnetwork 2 (Figure S12). The enrichment of KMD (CH_2_) series suggests a high presence of alkylated structures. Meanwhile, the prevalence of KMD (O) series, together with the previously observed abundance of N- containing compounds, highlight the role of heteroatom incorporation in modulating polarity.
Discussion
4
Molecular Insights into DOM Fractions from
FT-ICR MS
4.1
FT-ICR MS analysis provided detailed molecular-level insights into the IHSS samples, revealing how environmental origins and extraction protocols significantly alter their molecular compositions. ?−? ? Terrestrial fractions (including both HAs and FAs) and PLFA were characterized by elevated levels of N-containing compounds, suggesting stronger microbial influence, as dissolved organic nitrogen (DON) is produced directly by microbial turnover and indirectly by extracellular enzymes released by microbes. ?,? The high abundance of N-containing compounds thus reflects enhanced biological transformation of these DOM fractions. Specifically, PLFA exhibited a higher abundance of aliphatic molecular compositions, highlighting its microbial-dominated origin.? Due to the lack of higher plant inputs in Pony Lake, its DOM originates almost entirely from microbial sources.? Accordingly, microbial exudation and cell lysis introduce membrane lipids into the DOM pool of Pony Lake, enriching it with aliphatic-rich compounds. ?,? In comparison, aquatic DOM was enriched in CHO compounds with higher oxygenation states, such as tannins, reflecting allochthonous inputs from terrestrial DOM, limited transformation due to short residence times, and increased photooxidation. ?,? Extraction methods further separated materials into different fractions based on their physiochemical properties, particularly solubility. Soil HAs, precipitated upon acidification, were characterized by aliphatic and aromatic compounds, while soil FAs, remaining in solution, contained more oxidized and polar molecules, reflecting their greater solubility. Meanwhile, aquatic HAs and FAs showed greater similarity in chemical composition, likely because IHSS uses XAD-8 resin adsorption to preferentially retain the hydrophobic fraction from aquatic materials before alkaline extraction, rather than the direct alkaline extraction employed for solid materials.? It should be noted that no extraction method can fully recover the chemical nature of DOM,? and our results highlight the significance of considering both environmental sources and extraction methods in the molecular characterization of DOM.
Morphological
Profiling of DOM Fractions Using CP
4.2
Translating these molecular-level differences into functional biological outcomes remains challenging. The complexity of assessing the biological activity of DOM fractions is compounded by the limitations of conventional bioassays in resolving such mixtures and identifying bioactive components.? Phenotypic screening platforms, such as the CP, provide a promising alternative by capturing subtle morphological changes that reflect cellular responses to various perturbations.? While CP has proven effective in natural products research, ?,? its current application to DOM was hindered by issues such as strong autofluorescence and fluorescence quenching, particularly in HAs. Attempts to reduce material quantities further worsened signal loss, and current separation techniques were unable to remove the aromatic and carbohydrate components responsible for these effects.? These findings underline the importance of considering intrinsic optical properties when applying fluorescence-based assays to complex mixtures like DOM.
Despite these challenges, the CP successfully captured distinct morphological profiles in response to IHSS samples. Although all samples were tested at a relatively high concentration (100 ppm), the primary objective was to induce detectable morphological changes, thereby enabling direct comparison of the bioactive potential among different samples under the same conditions and allowing integration with FT-ICR MS data to explore their potential bioactive components. The CP results revealed a clear trend in morphological activity, with increased activity observed from NOM to FAs to HAs, and from aquatic to terrestrial sources. These patterns suggest that both the extraction methods and environmental origins of DOM influence its cellular responses. The elevated activity of terrestrial HAs aligned with previous studies highlighting their antiviral properties,? while challenging the assumption that a higher abundance of functional groups in FAs correlates with greater reactivity.? Instead, broader molecular traits such as aromaticity and nitrogen content may play a more significant role in driving biological effects, as aromatic rings and nitrogen atoms can interact with biological targets through noncovalent interactions and hydrogen bonding, respectively. ?,? Exposure to terrestrial HAs led to reduced nuclear size and cell area, key features of apoptosis-like processes associated with genotoxic response.? This evidence aligns with previous studies linking HAs to DNA damage and apoptotic pathways in mammalian cells, ?−? ? demonstrating the value of standardized IHSS samples in exploring the biological impacts of complex environmental mixtures. Importantly, these findings were obtained at concentrations exceeding those typically observed in natural environments, and their applicability under environmentally relevant conditions requires further validation at concentrations representative of natural systems.
Data Integration to Reveal Potentially Active
Components in DOM Fractions
4.3
sPLS was applied to investigate the association between chemical compositions and morphological outcomes, and the first sPLS component effectively separated terrestrial HAs from other samples. Chemical features that covaried with terrestrial HAs and were associated with morphological changes clustered into two distinct regions in the van Krevelen diagram, corresponding to lipid-like compounds and BC-type condensed aromatic compounds. These classes of compounds aligned with known antiviral and bioactive fractions of humic substances, characterized by hydrophobic, aromatic-rich, and polyanionic fractions, ?,? emphasizing their biological significance. Mass difference and KMD analyses provided additional insight into molecular relationships and transformation pathways among these components. ?,? Frequent mass differences related to alkylation and redox processes, along with patterns observed in CH_2_- and O-based KMD series, highlighted the role of alkylated and oxygenated molecular compositions in influencing cellular responses. Alkylation likely increases molecular amphiphilicity and lipophilicity, thereby facilitating membrane permeability.? Meanwhile, the presence of nitrogen and oxygen atoms increases the hydrogen-bonding capacity, enabling specific interactions with cellular targets including DNA or nuclear proteins. ?,? Consequently, the combination of high lipophilicity and strong hydrogen-bonding potential contributes to the pronounced morphological activity of terrestrial HAs, indicating that molecular characteristics rather than solely functional group counts are key in determining biological effects. Nevertheless, these results are based on sPLS correlations and high assay concentrations and should be considered as indications of potential bioactive components in IHSS samples that merit further isolation and characterization.
Nitro-polycyclic aromatic hydrocarbons (nitro-PAHs) likely occur within the BC-type condensed aromatic fraction identified by sPLS analysis. Although BC is often associated with combustion-derived sources, noncombustion pathways such as biomass oxidation can contribute substantially.? Nitro-PAHs which can form via combustion, are well-known as environmental contaminants capable of inducing DNA damage, ?−? ? supporting their potential role in the observed morphological changes. These considerations make nitro-PAHs plausible contributors to the observed morphological changes. BC is an emerging pollutant of rising concern in aerosols and poses significant health risks through inhalation, linked to cardiovascular and respiratory problems, including premature death, and can exacerbate conditions like asthma.? Recent studies further associate fine particles like PM2.5, which often contain BC, with cognitive decline and neurodegenerative diseases, including Alzheimer’s and Parkinson’s disease. ?−? ? ? Unlike physical interactions or electrostatic forces between fine particles, nitro-PAHs and derivatives may exert their effects primarily through covalent binding. ?,? Fine particles often carry surface charges that influence how they aggregate or interact with other components. This distinction emphasizes the importance of recognizing specific interaction mechanisms when evaluating the biological impacts of DOM. Consequently, there is a clear need for future targeted studies to identify and quantify nitro-PAHs and similar compounds within DOM fractions to better understand their roles and effects.
Environmental Implications
4.4
Our findings highlight specific molecular features within IHSS samples that influence cellular responses, providing insight into the potential bioactivity of DOM. While many chemical characteristics of DOM have been reported previously, the integration of high-content morphological profiling via the Cell Painting (CP) assay with statistical correlations to molecular properties offers a complementary perspective that has not been systematically explored. It is important to note that FTICR-MS is semiquantitative and biased toward strongly ionizable compounds, so the identified correlations indicate potential bioactive molecules rather than a comprehensive mapping of all bioactive species. Although FT-ICR MS intensities remain comparable across samples when preparation and analytical conditions are consistent, certain expected associations between biomolecules and morphological changes, such as those involving carbohydrate-like components, are absent in the current results. This may arise from analytical constraints (e.g., low ionization efficiency) or biological factors (e.g., limited cellular permeability). Therefore, the present work should be considered as an exploratory work, and further studies using complementary analytical techniques (e.g, Nuclear magnetic resonance) and additional multiparametric profiling assays (e.g, L1000) are encouraged to expand the application of current statistical strategy to fully explore the bioactive space of DOM.
All samples were tested at relatively high concentrations, exceeding environmentally relevant levels, to ensure measurable morphological changes. This approach was required due to the limited sensitivity of the CP assay when applied to complex environmental mixtures. Single-cell analysis is expected to improve resolution by capturing subtle and heterogeneous responses. Although the absolute magnitude of responses would likely decrease at lower concentrations, the relative bioactivity ranking remained consistent, with terrestrial humic acids exhibiting the strongest effects, particularly those associated with lipid-like compounds and nitrogen-rich black carbon–type molecules.
To prevent misinterpretation, these associations should not be taken as evidence that terrestrial humic acids in general pose health or environmental risks. Observed responses arise from specific molecular components within the IHSS samples studied. Nitro-PAH substituents are not inherent to all terrestrial humic substances, and their presence depends on environmental contamination and material processing. Established agricultural, industrial, and pharmaceutical uses of humic substances typically involve purification and quality control measures that limit such contaminants. The results therefore highlight molecular components that merit further investigation rather than indicating hazards associated with humic materials.
Supplementary Material
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Friedlingstein P.O’sullivan M.Jones M. W.Andrew R. M.Bakker D. C.Hauck J.Landschützer P.Le QuéréC.Luijkx I. T.Peters G. P.Global carbon budget 2023 Earth Syst. Sci. Data 202315125301536910.5194/essd-15-5301-2023 · doi ↗
- 2Lauretta D. S.Connolly H. C.Jr Aebersold J. E.Alexander C. M. O. D.Ballouz R. L.Barnes J. J.Bates H. C.Bennett C. A.Blanche L.Blumenfeld E. H.Asteroid (101955) Bennu in the laboratory: Properties of the sample collected by OSIRIS-R Ex Meteorit. Planet. Sci.20245992453248610.1111/maps.14227 · doi ↗
- 3Jackson R. B.Lajtha K.Crow S. E.Hugelius G.Kramer M. G.Piñeiro G.The ecology of soil carbon: pools, vulnerabilities, and biotic and abiotic controls Annu. Rev. Ecol. Evol. Systemat.201748141944510.1146/annurev-ecolsys-112414-054234 · doi ↗
- 4Guo B.Liu Y.Wang J.Zheng Q.Shi Q.He C.Jiao N.Environmental and microbial factors shape dissolved organic matter across multiple ecosystems Commun. Earth Environ.20256191710.1038/s 43247-025-02848-3 · doi ↗
- 5Wang M.Zhang S.Guo X.Xiao L.Yang Y.Luo Y.Mishra U.Luo Z.Responses of soil organic carbon to climate extremes under warming across global biomes Nat. Clim. Change 20241419810510.1038/s 41558-023-01874-3 · doi ↗
- 6Perdue, E. ; Ritchie, J. Dissolved organic matter in freshwaters. In Treatise on Geochemistry; Elsevier, 2003; Vol. 5, p 605.
- 7Regnier P.Resplandy L.Najjar R. G.Ciais P.The land-to-ocean loops of the global carbon cycle Nature 2022603790140141010.1038/s 41586-021-04339-935296840 · doi ↗ · pubmed ↗
- 8Hertkorn N.Ruecker C.Meringer M.Gugisch R.Frommberger M.Perdue E.Witt M.Schmitt-Kopplin P.High-precision frequency measurements: indispensable tools at the core of the molecular-level analysis of complex systems Anal. Bioanal. Chem.200738951311132710.1007/s 00216-007-1577-417924102 PMC 2259236 · doi ↗ · pubmed ↗
