Mosses ML: Machine-Learning-Enhanced Biomonitoring of Emerging Contaminants Using Hylocomium splendens: An Integrated Approach Linking Atmospheric Deposition, Trace Metals, and Predictive Risk Assessment
Grzegorz Kosior, Kacper Matik, Monika Sporek, Zbigniew Ziembik, Antonina Kalinichenko

TL;DR
This paper introduces Mosses ML, a machine learning framework that improves the detection and risk assessment of atmospheric pollutants using mosses as bioindicators.
Contribution
The novel contribution is the integration of machine learning with moss biomonitoring to enhance predictive and mechanistic insights into atmospheric contamination.
Findings
ML models achieved high predictive accuracy (R2 up to 0.91) in estimating moss metal concentrations from deposition metrics.
Dry deposition load and co-occurring metal signals were identified as the main predictors of contamination.
The ML approach improved high-risk site identification by 24–38% compared to traditional methods.
Abstract
Atmospheric deposition of emerging contaminants, including toxic trace elements, remains a critical environmental and public health concern. Moss biomonitoring offers a sensitive and cost-effective tool for assessing airborne pollutants, yet traditional analyses rely on descriptive statistics and lack predictive and mechanistic insights. Here, we introduce Mosses ML, a machine-learning-enhanced framework that integrates moss biomonitoring with bulk and dry deposition measurements to improve detection, interpretation, and risk assessment of atmospheric contaminants. Using Hylocomium splendens transplants exposed for 90 days across industrial, urban, and rural sites in Upper Silesia (Poland), we combined trace element accumulation (Cd, Pb, Zn, Ni, Cr, Fe), relative accumulation factors (RAFs), PCA-derived gradients, and site-level metadata with Random Forest and Gradient Boosting models.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLichen and fungal ecology · Bryophyte Studies and Records · Biocrusts and Microbial Ecology
1. Introduction
Monitoring emerging contaminants in the atmosphere remains a central challenge for environmental toxicology, public health, and evidence-based risk assessment. Rapid industrialization, urban expansion, and energy production have intensified emissions of toxic trace elements such as cadmium (Cd), lead (Pb), nickel (Ni), chromium (Cr), and zinc (Zn), all of which pose ecological and human-health risks due to their persistence, bioavailability, and capacity for long-range atmospheric transport [1,2]. Traditional instrumental approaches for assessing airborne contaminants—including air filters and precipitation collectors—provide high analytical precision but are costly, labor-intensive, and often unable to capture fine-scale spatial variability [3]. These limitations have led to the widespread adoption of terrestrial mosses as efficient biomonitors of atmospheric deposition within international initiatives such as UNECE ICP Vegetation [4]. Previous studies on moss biomonitoring of atmospheric trace metals have predominantly relied on classical statistical approaches such as analysis of variance (ANOVA), correlation analysis, and principal component analysis (PCA) to describe spatial pollution gradients and deposition–accumulation relationships. While effective for descriptive assessments and source attribution, these methods are limited when datasets exhibit nonlinear relationships, strong multicollinearity among co-occurring elements, or complex interactions between wet and dry deposition pathways. Importantly, classical statistical frameworks provide little predictive capability and only limited insights into the relative importance of interacting environmental drivers. In contrast, machine-learning approaches such as Random Forest and Gradient Boosting offer clear advantages for moss biomonitoring by capturing nonlinear dependencies, accommodating high-dimensional datasets and enabling robust prediction of metal concentrations. Importantly, the integration of explainable machine-learning tools, including SHAP analysis, allows mechanistic interpretation of model outputs, bridging predictive performance with process-level understanding of atmospheric deposition and bioaccumulation in mosses. Bryophytes, particularly pleurocarpous mosses such as Hylocomium splendens and Pleurozium schreberi, lack a root system and obtain nutrients exclusively from atmospheric inputs, making them sensitive integrators of both wet and dry deposition [5,6]. Their high cation-exchange capacity, surface roughness, and morphological complexity facilitate the interception and retention of airborne particulate matter, including poorly soluble metal-bearing particles [7]. Active biomonitoring using transplanted mosses further reduces environmental variability and allows controlled exposure periods, improving comparability across sites and time [1,8]. While moss biomonitoring has been extensively applied in terrestrial environments, analogous accumulation mechanisms have also been documented for aquatic mosses, which efficiently integrate dissolved and particulate contaminants from water matrices. Recent studies demonstrate that aquatic mosses exhibit comparable metal-binding and accumulation properties, although driven by different exposure pathways and hydrological controls [9]. Recent studies from Central Europe—including Upper Silesia—have shown pronounced spatial gradients in moss chemistry and elevated concentrations of industrially derived trace elements, especially in areas influenced by mining, metallurgical, and coal-based activities [10]. Despite these strengths, conventional statistical approaches used in moss biomonitoring remain primarily descriptive and are poorly suited to capturing nonlinear interactions among deposition pathways, emission sources, and biological accumulation processes. As emerging contaminants increasingly exhibit complex environmental behavior, new analytical frameworks are needed to improve detection sensitivity, interpret high-dimensional datasets, and support high-resolution environmental decision-making. Machine learning (ML) offers clear advantages in this context: algorithms such as Random Forests and Gradient Boosting can uncover hidden structures within environmental datasets, quantify feature importance, and predict contamination risk with substantially greater accuracy than classical methods [11,12]. However, ML has not yet been systematically integrated into moss biomonitoring, and no standardized framework exists for combining bioaccumulation data with atmospheric deposition metrics to support predictive risk assessment. To address this gap, we developed Mosses ML, a comprehensive ML-enhanced workflow that integrates (i) trace element concentrations in Hylocomium splendens transplants, (ii) bulk and dry deposition measurements, (iii) relative accumulation factors (RAFs), and (iv) site-level environmental metadata (industrial, urban, rural). This framework builds on validated experimental datasets from Upper Silesia—one of the most heavily industrialized regions in Central Europe—where dry deposition has repeatedly been shown to dominate particulate-bound metal fluxes. To our knowledge, this is the first study to integrate explainable machine-learning methods (including SHAP analysis) with moss biomonitoring to quantitatively disentangle the relative contributions of dry and bulk deposition to trace element accumulation. By combining RAF metrics, deposition measurements, multivariate ordination and predictive modelling, the Mosses ML framework provides a mechanistic and data-driven evaluation of atmospheric contamination that extends beyond traditional descriptive biomonitoring approaches. This methodological integration offers a generalizable template for assessing emerging contaminants using bryophyte-based monitoring systems. By incorporating machine learning, Mosses ML aims to achieve the following:
- Predict trace-metal concentrations in mosses from local environmental parameters;
- Identify the most influential drivers of contamination using feature importance and SHAP interpretability;
- Classify sampling sites into contamination risk categories with improved accuracy;
- Provide a scalable tool applicable to other bryophyte species and to emerging contaminant groups beyond metals.
This study presents the development, validation, and environmental application of the Mosses ML framework. By uniting established biomonitoring methods with modern computational tools, our approach enhances the interpretative and predictive power of moss-based assessments and strengthens their role as early-warning indicators within atmospheric pollution monitoring and regulatory policy.
2. Materials and Methods
2.1. Study Area and Sampling Design
The monitoring campaign was conducted in Upper Silesia (southern Poland), one of the most industrialized regions in Central Europe. The area is characterized by long-term emissions associated with mining, smelting, metallurgy, and fossil fuel combustion, resulting in elevated concentrations of metal-rich particulates. Fifteen sampling sites were classified into three categories representing a pollution gradient: industrial, urban, and rural. A reference site of low contamination was located near Roztoczański National Park. Site selection criteria included (i) representativeness of land-use type, (ii) absence of canopy cover within a 5 m radius, and (iii) spatial separation sufficient to avoid cross-influence between sampling categories. Figure 1 illustrates the distribution of all locations.
2.2. Moss Transplant Preparation and Exposure Conditions
Hylocomium splendens was selected due to its established performance as a bioindicator of atmospheric contaminants. Moss cushions were collected from an uncontaminated forested area and prepared as 10–12 g composite samples. Material was gently cleaned of debris, placed onto polyethylene mesh pads, and transported to field sites. At each location, five replicates were positioned directly on the soil surface to ensure exposure to natural precipitation and airborne particulate flux. Transplants were positioned in open areas to avoid canopy drip and ensure uniform exposure. Transplants remained in the field for 90 days, with intermediate collection at day 45. Initial (day 0) metal concentrations were measured to calculate relative accumulation factors (RAFs) and to provide baseline information for ML models.
2.3. Bulk (Wet) Deposition Sampling
Bulk deposition was sampled using polyethylene collectors equipped with 2 L bottles and 10 cm funnels covered by nylon mesh to prevent coarse debris ingress. Collectors were mounted 1.5 m above the ground, with five replicates per site. Thymol was added to inhibit microbial growth. After each precipitation event, samples were transported under cooled conditions and filtered through 0.45 µm cellulose filters. Filtered samples were pooled and analyzed for trace elements after 45 and 90 days of exposure.
2.4. Dry Deposition Sampling
Dry deposition was quantified using glass plates (80 mm diameter) uniformly coated with pharmaceutical-grade white petrolatum. The plates were pre-weighed (±0.2 mg) after gentle heating (42 °C) to standardize the adhesive layer. Five plates were deployed per site for the full 90-day exposure period. To prevent contamination by wet deposition, each plate was mounted horizontally beneath a small protective roof that effectively shielded the sampling surface from direct rainfall while allowing unrestricted deposition of airborne particles. The shields were designed to minimize airflow disturbance and did not obstruct lateral particle flux. After exposure, particles adhering to the plates were removed with hexane, filtered onto ash-free cellulose filters, dried, combusted, and digested for elemental determination.
2.4.1. Chemical Analysis of Moss and Deposition Samples
Moss, bulk deposition, and dry deposition samples were digested using a microwave-assisted acid digestion procedure. Approximately 0.3 g of dried moss material or an equivalent mass of dust residue was mineralized using a mixture of ultrapure nitric acid (HNO_3_, 65%) and perchloric acid (HClO_4_, 70%) in a volume ratio of 3:2 (v/v). Digestion was carried out in closed Teflon vessels using a microwave digestion system with a temperature ramp to 180 °C over 15 min, followed by a holding time of 20 min at a maximum operating pressure of approximately 20 bar. After cooling, the digests were diluted to a fixed volume with ultrapure water prior to elemental determination. Elemental concentrations were determined via Flame Atomic Absorption Spectroscopy (FAAS) for Fe, K, Mg, Mn, and Zn and via Electrothermal Atomic Absorption Spectroscopy (ETAAS) for Cd, Co, Cr, Cu, Ni, Pb, and V, using an AVANTA atomic absorption spectrometer (GBC Scientific Equipment, Melbourne, Australia) [13].
2.4.2. Quality Control and Assurance
Quality assurance and quality control procedures included the analysis of procedural blanks (acid digestion blanks), filter blanks, and unexposed moss reference material. Blank samples were processed using the same digestion and analytical protocols as field samples. Concentrations of all analyzed elements in blank samples were consistently below the respective limits of detection, indicating negligible contamination during sample preparation and analysis. Analytical accuracy was verified using certified reference materials of moss (M2 and M3, Finnish Forest Research Institute, Helsinki, Finland). Element recoveries ranged from 92% to 106% for all analyzed metals. Method detection limits (LOD) and limits of quantification (LOQ) were calculated as three and ten times the standard deviation of blank measurements, respectively. Element-specific LOD and LOQ values are reported in Supplementary Table S1. Water analysis was validated using SPS-SW1 reference material. All laboratory procedures followed standard QA/QC protocols recommended for moss biomonitoring within ICP Vegetation.
2.4.3. Calculation of Relative Accumulation Factors (RAFs)
Relative accumulation factors were calculated to quantify metal enrichment relative to initial concentrations:
where C_initial_ represents trace element concentrations in moss material before transplantation, and C_exposed_ corresponds to values after 90 days.
2.4.4. Data Preprocessing for Machine Learning
Moss chemistry, bulk and dry deposition datasets, and site metadata were merged into a unified analytical matrix. Preprocessing included outlier removal (>3× IQR), k-nearest neighbor imputation for missing values, log-transformation of skewed variables, and z-score standardization applied within cross-validation folds to prevent data leakage. PCA was used as optional dimensionality reduction. All preprocessing steps were performed exclusively within training folds to avoid information leakage.
2.4.5. Machine-Learning Models and Workflow
Three model families were employed: Random Forest Regressor [11], Gradient Boosting Regressor [12], PCA-enhanced Gradient Boosting (hybrid approach). The predictive targets were moss concentrations of Cd, Pb, Zn, Ni, Cr, and Fe. Input features included deposition metrics, RAF values, and site categories. Model evaluation used 10-fold cross-validation with performance metrics R^2^, RMSE, and MAE. Feature importance scores were extracted using impurity-based metrics and SHAP analysis. Feature importance in Random Forest models was quantified using the impurity-based mean decrease in variance. To improve interpretability and address limitations of impurity-based metrics, SHAP (SHapley Additive exPlanations) analysis was additionally applied to quantify feature contributions at both global and individual prediction levels.
2.4.6. Risk Classification Model
Contamination-risk categories were defined using ML-derived thresholds based on predicted metal concentrations and deposition loadings. Sites were classified as very high risk when predicted concentrations exceeded the 75th percentile of the dataset, high risk when values fell between the 50th and 75th percentiles, and moderate risk when values were below the 50th percentile. These thresholds were applied consistently across all target metals to ensure comparability of risk classification results. Percentile thresholds were selected to provide a data-driven yet conservative classification scheme rather than regulatory exceedance criteria. Classification performance was compared with a classical threshold-based approach using accuracy, F1-score, and ROC metrics.
2.5. Software
Analyses were conducted in Python 3.11 (scikit-learn, pandas, numpy, shap) and R 4.3.1 for PCA visualization.
3. Results
3.1. Metal Concentrations and Relative Accumulation Factors
Descriptive statistics for metal concentrations in Hylocomium splendens transplants after 90 days of exposure are presented in Table 1. Metal concentrations exhibited a clear spatial gradient across the study area, with the highest values consistently observed at industrial sites, intermediate levels at urban sites, and the lowest concentrations at rural sites (Figure 2). This pattern was evident for Pb, Cd, Zn, Fe, Ni, and Cr. Relative accumulation factors (RAFs), calculated after 90 days of exposure, are summarized in Table 2. RAF values confirmed the observed pollution gradient, with industrial sites showing the highest enrichment for most elements, particularly Pb, Cd, Zn, and Fe. In contrast, Mn exhibited consistently negative or near-zero RAF values across all site categories. It should be noted that relative accumulation factors (RAFs) are normalized to initial (background) metal concentrations in the transplanted moss material. Consequently, high absolute Pb concentrations at industrial sites do not necessarily correspond to proportionally higher RAF values when background variability is taken into account. This normalization explains the apparent discrepancy between absolute Pb concentrations and RAF magnitudes across site categories.
3.2. Atmospheric Deposition Patterns
Atmospheric deposition data are presented in Table 3 and Table 4. Bulk deposition showed moderate variability among site categories, with higher concentrations generally recorded at industrial locations. Correlation analysis indicated moderate relationships between bulk deposition and moss concentrations for selected elements, particularly Cd, Ni, and Zn (Table 3). In contrast, dry deposition displayed stronger differentiation among site categories (Table 4), with substantially higher concentrations of Pb, Zn, Cr, Fe, and Cu at industrial sites. Correspondingly, moss–deposition correlations were strongest for dry deposition, encompassing Cd, Cr, Cu, Mn, Ni, Pb, and Zn at industrial sites and multiple elements at urban and rural sites.
3.3. Multivariate Analysis (PCA)
Principal component analysis (PCA) of standardized moss metal concentrations revealed clear separation among rural, urban, and industrial sites (Figure 3). The first two principal components (PC1 and PC2) together explained 76.7% of the total variance in standardized metal concentrations. PC1 was primarily associated with Pb, Cd, Zn, Fe, and Cr, while rural sites clustered at low PC1 scores and industrial sites at high positive PC1 scores.
3.4. Machine-Learning Model Performance
Machine-learning model performance metrics are summarized in Table 5. Model performance was evaluated using cross-validated coefficient of determination (R^2^), root mean square error (RMSE), and mean absolute error (MAE), allowing quantitative comparison of predictive accuracy across machine-learning algorithms (Table 5). Differences in reported R^2^ values reflect distinct evaluation contexts. The R^2^ values reported in the Table 5 represent cross-validated model performance. In contrast, Figure 4 illustrates a single predicted-versus-observed split for visualization purposes, while Figure 5 presents in-sample diagnostics associated with feature-importance analysis. These metrics are therefore not directly comparable but serve complementary analytical roles. Random Forest regression achieved high predictive accuracy for key metals, with cross-validated R^2^ values approaching 0.90 for Pb, Cd, and Zn. Predicted versus observed plots (Figure 4) showed clustering around the 1:1 line, supported by low RMSE values. Lower R^2^ values in Figure 4 reflect visualization on a single test split and are not directly comparable with cross-validated metrics. Feature importance rankings derived from the Random Forest model (Figure 5; Table 6) identified dry deposition load and co-occurring metal concentrations as the most influential predictors. SHAP analysis further quantified feature contributions, highlighting the dominant role of dry deposition mass and associated metals (Figure 6).
4. Discussion
The spatial gradients observed in Hylocomium splendens metal concentrations reflect the strong industrial influence characteristic of Upper Silesia, a region affected by long-term mining, smelting, and fossil-fuel combustion activities. The pronounced enrichment of Pb, Cd, Zn, and Fe at industrial sites is consistent with previous biomonitoring studies conducted in Central Europe and confirms the suitability of H. splendens as a sensitive indicator of particulate-bound atmospheric pollution. The metal enrichment levels observed in H. splendens transplants from Upper Silesia are comparable to those reported in heavily industrialized regions of Europe, including parts of Norway, Serbia, and Spain [14,15,16]. In these regions, elevated concentrations of Pb, Cd, and Zn in mosses have similarly been linked to metallurgical activity, mining, and fossil-fuel combustion. Compared with these studies, the present dataset exhibits comparable or higher accumulation intensities at industrial sites, underscoring the severity of particulate pollution in the study area. The high RAF values obtained for Pb, Cd, and Zn indicate efficient retention of these metals within moss tissues. The lower Mn concentrations observed at industrial sites do not contradict the general enrichment trend for other trace metals. Manganese is known to behave differently in moss tissues, as it is more susceptible to physiological regulation, leaching, and washout processes than particulate-bound metals such as Pb, Cd, or Zn [17,18]. Negative or near-zero RAF values for Mn have been widely reported in both active and passive moss biomonitoring studies and are commonly attributed to post-depositional remobilization rather than reduced atmospheric input. These elements are known to form strong bonds with cation-exchange sites in moss cell walls, resulting in limited remobilization after deposition [19]. In contrast, the consistently negative RAF values observed for Mn are in line with well-documented leaching and physiological regulation processes, rather than reflecting reduced atmospheric input [20]. The comparison between bulk and dry deposition highlights the dominant role of particulate transport in shaping moss chemistry in the study area. While bulk deposition contributed to the accumulation of more soluble elements, dry deposition exerted a substantially stronger influence across all site categories. This finding supports previous evidence that H. splendens, owing to its highly branched morphology and surface roughness, efficiently intercepts coarse and fine particulate matter [6,16,21]. Multivariate and machine-learning analyses provided complementary insights into these patterns. PCA confirmed that industrial emissions generate coherent metal assemblages that clearly separate polluted from background sites. Machine-learning models further demonstrated that moss metal concentrations can be accurately predicted from deposition metrics and site characteristics, outperforming traditional statistical approaches. The strong contribution of dry deposition and co-occurring metals identified through SHAP analysis reinforces the mechanistic interpretation that industrial particulates represent the primary vector of metal accumulation in mosses. Despite these strengths, several limitations should be acknowledged. The relatively small number of sampling sites limits the complexity and generalizability of the ML models. Additionally, the study reflects short-term exposure conditions, and longer-term or multi-season monitoring would improve model robustness. Finally, external validation using independent datasets is needed to fully assess the transferability of the Mosses ML framework to other regions and biomonitor species. The integration of machine-learning models further extends previous approaches by providing predictive capability and quantitative assessment of feature importance, which have rarely been applied in moss biomonitoring studies to date.
5. Conclusions
This study demonstrates that integrating machine-learning techniques with traditional moss biomonitoring yields a significantly enhanced analytical framework for assessing atmospheric contamination. Hylocomium splendens transplants proved to be highly effective bioindicators of particulate-bound metals, consistently reflecting the strong industrial pollution gradient observed across the study area. The dominance of dry deposition as a predictor of metal uptake confirms that mosses act primarily as collectors of airborne particulates [22,23], a finding further supported by high RAF values, PCA separation, and SHAP-based mechanistic insights. The Mosses ML workflow achieved high predictive accuracy for key contaminants, surpassing the diagnostic capacity of classical statistical approaches. Its improved risk classification and transparent model interpretability address long-standing challenges in biomonitoring, particularly in complex industrial landscapes where pollutant sources co-occur. By combining biological indicators, deposition measurements, and advanced computational tools, this framework enhances environmental surveillance and provides a scalable, transferable model for future biomonitoring initiatives, including multi-year monitoring, cross-regional comparisons, and assessments of emerging contaminants.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Berg T. Steinnes E. Use of mosses (Hylocomium splendens and Pleurozium schreberi) as biomonitors of heavy metal deposition: From relative to absolute deposition values Environ. Pollut.199798617110.1016/S 0269-7491(97)00103-615093345 · doi ↗ · pubmed ↗
- 2Markert B.A. Breure A.M. Zechmeister H.G. Bioindicators and Biomonitoring Elsevier Amsterdam, The Netherlands 2003
- 3Ares A. Aboal J.R. Carballeira A. Giordano S. Adamo P. Fernández J.A. Moss bag biomonitoring: A methodological review Sci. Total Environ.201243214315810.1016/j.scitotenv.2012.05.08722728302 · doi ↗ · pubmed ↗
- 4UNECE ICP Vegetation Heavy Metals in European Mosses: 2000/2001 Survey Centre for Ecology & Hydrology, University of Wales Bangor, UK 2003
- 5Berg T. Røyset O. Steinnes E. Moss (Hylocomium splendens) used as biomonitor of atmospheric trace element deposition: Estimation of uptake efficiencies Atmos. Environ.19952935336010.1016/1352-2310(94)00259-N · doi ↗
- 6Steinnes E. Hanssen J.E. Rambæk J.P. Vogt N.B. Atmospheric deposition of trace elements in Norway: Temporal and spatial trends studied by moss analysis Water Air Soil Pollut.19947412114010.1007/BF 01257151 · doi ↗
- 7Halleraker J.H. Reimann C. de Caritat P. Finne T. Kashulina G. Niskavaara H. Bogatyrev I. Reliability of moss (Hylocomium splendens and Pleurozium schreberi) as bioindicators of atmospheric chemistry in the Barents region: Interspecies and field duplicate variability Sci. Total Environ.199821812313910.1016/S 0048-9697(98)00205-8 · doi ↗
- 8Fernández J.A. Aboal J.R. Couto J.A. Carballeira A. Sampling optimization at the sampling-site scale for monitoring atmospheric deposition using moss chemistry Atmos. Environ.2002361163117210.1016/S 1352-2310(01)00575-1 · doi ↗
