Resource Characteristics of Common Reed (Phragmites australis) in the Syr Darya Delta, Kazakhstan, by Means of Remote Sensing and Random Forest
Azim Baibagyssov, Anja Magiera, Niels Thevs, Rainer Waldhardt

TL;DR
This study estimates reed biomass in Kazakhstan's Syr Darya Delta using satellite data and machine learning, revealing high potential for sustainable bioeconomy.
Contribution
The study introduces a novel application of Random Forest regression with Sentinel-2 data to map and quantify reed biomass in large wetland areas.
Findings
Submerged dense reed had the highest biomass at 28.21 t ha−1 in the Syr Darya Delta.
Random Forest models using NDVI and NDWI achieved high predictive accuracy (R2 = 0.93 during calibration).
The study identified 1,240,789 tons of reed biomass resources in the region.
Abstract
Reed beds, often referred to as dense, nearly monotonous extensive stands of common reed (Phragmites australis), are the most productive vegetation form of inland waters in Central Asia and exhibit great potential for biomass production in such a dryland setting. With its vast delta regions, Kazakhstan has the most extensive reed stands globally, providing a valuable case for studying the potential of reed beds for the bioeconomy. However, accurate and up-to-date figures on available reed biomass remain poorly documented due to data inadequacies in national statistics and challenges in measuring and monitoring it over large and remote areas. To address this gap in knowledge, in this study, the biomass resource characteristics of common reed were estimated for one of the significant reed bed areas of Kazakhstan, the Syr Darya Delta, using ground-truth field-sampled data as the dependent…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11- —German Academic Exchange Service (DAAD) for this research and the APC under the program Development-Related Postgraduate Courses (EPOS)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLand Use and Ecosystem Services · Remote Sensing and LiDAR Applications · Coastal wetland ecosystem dynamics
1. Introduction
Inland wetlands rank among the most biologically productive ecosystems in the world and are home to some large aquatic plants, also known as macrophytes [1,2,3,4]. These plants, adapted to grow partially or fully submerged in water, significantly enhance the ecological complexity and functionality of these vital habitats while providing substantial ecological, economic, and social benefits [5,6,7].
One prominent example of such aquatic plant species worldwide is the common reed (Phragmites australis) [8,9]. This tall, perennial emergent graminoid is recognized for its high net primary productivity and extensive global distribution. Typically classified as a hydrophyte or helophyte, it flourishes in waterlogged and inundated areas with little competition from other vegetation. As a result, it often dominates the aquatic flora of inland waters across diverse climatic zones and geographical regions, forming extensive, nearly homogeneous stands in deltas, along lakeshores, and in almost any natural or man-made marshy area [10,11].
Although these characteristics position common reed as a key species in various aquatic and freshwater ecosystems worldwide, its role within natural communities is perceived controversially [12]. Depending on the social and ecological context in which common reed and its ecosystem services are embedded, it is regarded as beneficial for wetland ecosystems or an invasive species [13]. For instance, in some areas of North America, its rapid spread has been identified as detrimental to native species and overall wetland biodiversity; hence, management measures have been taken to curb its spread [14,15,16]. In other regions, however, it is recognized for its multifunctional roles within wetland ecosystems, such as providing habitats for a number of other species and as a source of biomass, highlighting its capacity to serve as a feedstock source for the bio-based circular economy and categorizing it among the promising sources of renewable biomass worldwide [17,18,19]. Moreover, its high levels of peat formation make common reed the most suitable and frequently used wetland plant for paludiculture and phytoremediation [20,21,22]. Therefore, common reed has the potential to be an aquatic plant that provides a number of benefits [23].
The vast areas of reed beds across the wetlands of Central Asia make this region particularly relevant to exploring the benefits of this aquatic plant [24,25]. It produces huge amounts of lignocellulosic biomass, representing a significant albeit often overlooked and underestimated resource that can be tapped as a raw material for the bioeconomy, even if only some of these reeds are used to ensure space for biodiversity conservation [26,27]. Currently, small quantities of reed are traded internationally in the form of thatch. But, with the growing global demand for sustainably sourced biomass [28], these vast reed biomass resources could develop into a valuable domestic source of feedstock for a bio-based circular economy and generate economic opportunities for rural areas [26,27]. Though this potential is receiving increasing attention from local stakeholders and policymakers, fine-scale national inventories of a spatially detailed assessment of common reed’s above-ground biomass, which serve as a basis for sound regional land-use planning, including reed biomass sourcing, are lacking across the whole region of Central Asia. Therefore, accurate and cost-effective methods for estimating above-ground and above-water standing reed biomass are essential for successfully establishing and maintaining reed stocks for the bioeconomy while preventing overutilization. In addition to biomass, reed stands provide various ecosystem services, including animal fodder provision, water cycle regulation, and recreational opportunities [24,25,29,30].
Kazakhstan is home to the world’s most extensive reed bed areas, accounting for 20% of the total global expanse [23]. This is mainly attributed to its extensive wetland complexes in the river deltas and along inland water bodies [26]. Despite this valuable opportunity to explore the potential of reed beds for the bioeconomy, assessments of the area and distribution of reed beds, as well as management advice in these delta regions, remain limited. While a detailed assessment and mapping of wetlands area and reed biomass resources in the Ili Delta at Lake Balkhash have been previously conducted [24], similar studies for other major reed bed areas in the country—such as the Syr Darya Delta at the Lesser Aral Sea, the Ural Delta at the Caspian Sea and similar—have yet to be performed. Consequently, reliable and up-to-date information regarding the area, the spatial distribution of reed beds, and their biomass potential in these areas is still missing, even though further research into the utilization of reed beds and their integration into landscape planning in Kazakhstan primarily focuses on these deltas.
In light of this background, the objectives of this study were to model a spatially explicit yield map for Phragmites australis-dominated wetlands and contribute to closing the gap of missing reed distribution and biomass data by quantifying the area and standing biomass of reed beds in the Syr Darya Delta, Kazakhstan. The wetland ecosystems within the Syr Darya Delta near the Aral Sea, including the areas of abandoned cropland now covered by reed of the adjacent Qazaly Irrigation Zone, were selected as a model case study for conducting such a mapping exercise. Hence, this study may serve as a blueprint for further mapping efforts of reed areas and biomass in analogous contexts and contribute to building a better knowledge base on submerged plants and options for their utilization, thereby facilitating informed planning and management.
To guide this research, two key questions motivated the study: (1) What is the area and spatial distribution of wetlands and Phragmites australis-dominated vegetation in the Syr Darya Delta, divided into submerged and non-submerged reed beds? (2) Building on the preceding, what is the standing biomass of Phragmites australis in the Syr Darya Delta? Here, the term “standing biomass” thereby refers to fresh biomass and excludes all dead biomass from previous years.
To address these research questions, this paper presents, for the first time, a spatially explicit yield map of Phragmites australis-dominated wetlands in the Syr Darya Delta, Kazakhstan utilizing remote sensing and Random Forest (hereafter “RF” or “rf”) [31] regression modeling to assess their area and biomass potential. The application of data from the Sentinel-2 satellite, with its high-resolution imagery (10 m × 10 m) and frequent revisiting times of 3–5 days, is particularly suited for monitoring dynamic ecosystems like wetlands that rely on fluctuating water supplies. By comparing the standing biomass in the Syr Darya Delta to that in the Ili Delta and other regions, as well as to traditional biomass resources such as wood, we report on the considerable biomass potential of common reed in the region’s wetlands, emphasizing its ecological and socio-economic significance as a sustainable feedstock source for the bioeconomy, particularly in arid regions like Central Asia.
2. Results
2.1. Plot-Level Above-Ground and Above-Water Surface Standing Biomass of Reed Beds and Relationships with Remote Sensing Variables
The dry standing above-ground and above-water surface biomass of reed beds (hereafter “Phragmites biomass” or “reed biomass”) ranged from 1.14 t ha^−1^ to 51.33 t ha^−1^, with a mean biomass of 15.94 t ha^−1^ (Figure 1).
A comparative analysis of reed biomass across the various land cover classes revealed that the land cover class with the submerged dense reed exhibited the highest mean biomass at 28.21 t ha^−1^. This was followed by the land cover class of the dense non-submerged reed with a mean biomass of 15.24 t ha^−1^, while the land cover class of the open reed and shrub vegetation had a significantly lower mean biomass of 4.36 t ha^−1^, as summarized in Table 1. Notably, the maximum Phragmites biomass of 51.33 t ha^−1^ was recorded within the land cover class with the submerged dense reed.
An examination of correlation coefficients, conducted prior to constructing the spatially detailed yield assessment models using the Random Forest machine learning algorithm, revealed considerable variability in the strength of the correlation between the above-ground and above-water standing biomass of reed beds (referred to as biomass) and the spectral data obtained from the Sentinel 2 satellite, which served as explanatory variables in the regression models.
With an R^2^ of 0.72, the NDVI exhibited a robust positive correlation with the standing biomass values of reed beds, followed by the NIR, with an insignificant positive relationship. The rest, conversely, displayed a negative correlation with the NDWI with an R^2^ of −0.64, significantly leading and suggesting an inverse relationship with the standing biomass.
2.2. The Random Forest Model Performance and Mapping of Above-Ground and Above-Water Surface Reed Biomass and Spatial Distribution Patterns
The R^2^ values ranged from 0.34 to 0.81, while the RMSE values ranged from 4.08 t ha^−1^ to 10.93 t ha^−1^ (see Appendix A for additional details).
Among the 24 RF regression models developed based on reed biomass data alone, the rf37 model (illustrated in Figure 2a) showed the highest predictive accuracy, with an RMSE of 6.81 t ha^−1^ and an R^2^ of 0.77 during calibration. After validation, it maintained an RMSE of 6.33 t ha^−1^ and an R^2^ of 0.68. Conversely, among the other 24 RF regression models that incorporated both reed biomass and land cover data, the rf41 model (shown in Figure 2b) demonstrated superior performance, achieving the lowest RMSE of 4.08 t ha^−1^ and the highest R^2^ of 0.81 during calibration, followed by an RMSE of 5.41 t ha^−1^ and an R^2^ of 0.79 during validation. The performance of these models in estimating standing reed biomass, as evaluated on the testing (validation) dataset during model validation, is depicted in the scatter plots comparing the predicted standing biomass to the observed standing biomass of the validation dataset (Figure 2).
Considering these model performance results, a final map (WGS 84/UTM zone 41 N) (Figure 3) illustrating the spatial distribution of above-ground and above-water surface Phragmites biomass in the Syr Darya Delta, Kazakhstan, was produced using the model of the best fit, i.e., the rf41 model. Performed in the Forest-Based Classification and the Regression Tool of ArcGIS Pro 3.2.2, this mapping yielded a slightly higher predictive accuracy with an R^2^ = 0.93 and a lower RMSE = 2.74 t ha^−1^ during the model calibration, along with an R^2^ = 0.83 and an RMSE = 3.71 t ha^−1^ at its validation. The slightly better outputs can be attributed to ArcGIS Pro’s advanced geospatial analysis capabilities and optimized algorithms to handle spatial data more effectively.
This resulting map clearly delineates the spatial distribution of the above-ground and above-water surface standing reed biomass at the landscape level within the Syr Darya Delta, Kazakhstan. According to its spatial distribution patterns, dense standing Phragmites biomass is predominantly located along the Syr Darya River’s right (Figure 3a) and left (Figure 3c) branches and along the shores of deltaic lakes. These lakes are distributed in the northern, northwestern, southern, and southeastern parts of the Syr Darya Delta, Kazakhstan. In contrast, the central part of the delta showcases lower biomass densities as it is primarily occupied by irrigated croplands (Figure 3b). In this area, reed growth is concentrated mainly along irrigation canals and in abandoned as well as post-flooding lands unsuitable for regular cropping.
2.3. Reed Bed Area Assessment and Biomass Resource Quantification
The area of each of the classes from Figure 3 is listed in Table 2.
The total above-ground and above-water surface biomass of Phragmites australis within the study area for 2019–2020 was estimated at approximately 2.5 million tons. A significant portion of the study area’s land, accounting for 75.4%, falls within a broad land cover class that includes open water, bare land with open sands, and areas considered to be open reed and shrub vegetation. This is followed by the land cover class with non-submerged dense reed beds (18.9%) and the land cover class with submerged dense reed beds (5.7%).
The combined area of land cover classes of dense non-submerged and submerged reed beds—exhibiting standing biomass greater than 10.5 t ha^−1^—totaled 58,935 hectares. This area translates to an estimated 1,240,789 tons of reed biomass. These considerable biomass resource figures highlight the abundant availability of Phragmites biomass within the wetlands of the Syr Darya Delta. Furthermore, they underscore the significant economic potential of this resource for the sustainable livelihoods of the local population, particularly in its prospective role as a domestic feedstock for sustainable biomass utilization in the region.
3. Discussion
3.1. Estimated Reed Biomass and Its Spatial Distribution in the Syr Darya Delta, Kazakhstan
The Phragmites biomass of the submerged dense reed class in this study is in the same range as the Phragmites biomass measured in Lake Burullus in Egypt, 54 t/ha, as reported by [32], the Ili Delta in Kazakhstan, 5.5 to 57.2 t/ha according to [24], or further studies listed by [33]. The reed biomass values of the land cover classes non-submerged dense reed and open reed and shrub vegetation of this study are higher than the corresponding values from the Ili Delta [24]. This difference might be explained by a higher grazing pressure observed in the Ili Delta compared to the study area at the Syr Darya, as the two non-submerged reed land cover classes in the Ili Delta are frequently grazed. Such grazed areas show a lower NDVI and, thus, lower biomass compared to non-grazed areas.
The spatial distribution of the reed area with high biomass is particularly pronounced along river branches, irrigation canals, and deltaic lake shores. This pattern mirrors observations made in other deltas, such as the Ili Delta, and indicates that hydrological factors play a vital role in determining reed biomass distribution in these continental arid regions. This is further supported by the relationship between reed biomass and NDWI, which is the second most important variable across all models in this context. Water bodies create optimal conditions for reed growth, as they are regularly inundated and maintain consistent moisture throughout the growing season [34,35].
In contrast, the desiccation and drying of constricted and small deltaic lakes adversely impact the growth and viability of reed beds, leading to their localized thinning, a reduction in extent due to an elevation of extremely unfavorable salinity in both water and soil, and drying of the soil [33,36].
Moreover, the productivity of natural reed beds is shaped not only by the ecological features of their environment but also the genetic variation and phenotypic traits of local ecotypes, as discussed by [37,38]. While [36] suggested that the phenotypes of Phragmites australis in continental arid environments align with flood regimes and topsoil salinization, [33] reported substantial variability in productivity among different Phragmites australis phenotypes in relation to their specific growth conditions and grazing pressures identified in Southern Xinjiang. Echoing these findings, varying phenotypes of Phragmites australis have also been observed within this study in the Syr Darya Delta; however, the genetic variation patterns across these populations within this arid region of Central Asia and their implications for productivity remain unexplored, highlighting a critical area for further research.
3.2. Implications for Management Strategies and Sustainable Use of Reed Biomass Resources
The biomass that can be harvested annually from the submerged reed as raw material is substantially higher than from tree plantations or forests, e.g., Populus and Picea schrenkiana, which are the most important timber and fuel wood species in the southern part of Kazakhstan. Shatalov (1973) [39] reports a volume stock of 465 m^3^/ha for a 30-year-old Populus nigra stand in the steppe zone of Kazakhstan. This corresponds to an average annual growth of 5.47 t/ha (considering a wood density of 0.353 g/cm^3^ according to Chave et al. [40]). Notably, the reed biomass of the submerged reed in this study is in the same range of the most optimistic assumption for the yield of a poplar plantation with modern poplar cultivars [41], thereby underscoring the potential of the reed as a viable biomass resource.
Similarly, for a 60-year-old P. schrenkiana stand, Kozlovskiy and Pavlov [42] listed a volume stock of 374 m^3^/ha, which corresponds to an average annual increment of 2.26 t/ha, again considering a wood density of 0.362 g/cm^3^ [40]. These comparisons emphasize the remarkable biomass potential of submerged reed in the region, which may yield substantial ecological and economic benefits through improved harvesting and utilization practices.
The strategic harvesting and utilization of reed biomass present a dual opportunity: it could substantially reduce the country’s reliance on imported wood and lower pressure on forests across Kazakhstan. By incorporating reed biomass into various industrial applications, such as the production of chipboards—a widely used material for residential construction across Central Asia—stakeholders could create a sustainable alternative that fosters local economic development while promoting the conservation and wise use of wetland and forest ecosystems [24,43].
Furthermore, establishing a market for reed biomass not only aids in environmental conservation but also contributes to the bioeconomy, particularly in underdeveloped rural areas such as the downstream regions of Kazakhstan [26]. Consequently, it is essential for policymakers to formulate management strategies that prioritize research, incentives for sustainable harvesting, and technological advancements in biomass processing. Through these initiatives, Kazakhstan can pave the way for an eco-efficient approach to resource utilization that aligns with global sustainability goals. Ultimately, the recognition of reed biomass as a primary resource can facilitate a more harmonious balance between economic development and ecological stewardship in the region.
3.3. Limitations of Above-Ground and Above-Water Surface Wetland Biomass Assessment Using Random Forest Predictive Modeling and Satellite Data
In recent years, there has been growing scientific interest in using the Random Forest machine learning algorithm and remote sensing techniques to estimate above-ground biomass in various ecosystems. Significant improvements in the mapping and assessment of above-ground biomass have been reported for ecosystems such as forests [44,45,46], including mangroves [47,48], grasslands [49,50,51], rangelands [52,53], savannas and woodlands [54,55,56,57], and wetlands [58,59,60]. Likewise, this study demonstrates the effectiveness and robustness of the Random Forest regression model in combination with high-resolution Sentinel-2 remote sensing data and ground-truth data in mapping and quantifying Phragmites biomass within the wetland ecosystems of the Syr Darya Delta, Kazakhstan.
The Normalized Difference Vegetation Index (NDVI) showed a strong positive correlation with the standing biomass of reed beds, confirming its utility as a reliable predictor for biomass. In contrast, the Normalized Difference Water Index (NDWI) exhibited a significant inverse correlation, indicating an alternative ecological dynamic related to water content and biomass levels. However, consistent with findings from other biomass assessment studies [58,61,62], we observed an asymptotic saturation trend in the NDVI as Phragmites biomass increased. Due to this phenomenon, the predictive accuracy of the Random Forest regression model is limited, particularly when the observed biomass exceeds 30 t ha^−1^ in areas with dense submerged stands.
Despite the practicality of the Random Forest regression method, several challenges and limitations are often encountered during its application to map and assess wetland above-ground and above-water biomass. For instance, the complexity of spectral signatures in wetland environments—due to the presence of water, shadowing, and varying vegetation densities—makes it challenging to accurately differentiate and quantify above-ground and above-water biomass [63,64]. Additionally, validating biomass estimation models derived from the Random Forest algorithm and high-resolution Sentinel-2 data requires comprehensive and robust ground-truth data collection; uncertainties linked to field measurements may affect above-ground biomass inversion due to spatial and temporal variability in plant community characteristics and seasonal productivity, such as vegetation phenology [62,65,66].
Discrepancies may arise from sampling errors, variations in the spatial resolution of biomass sampling sites in relation to the resolution of raster data, or differences in the timing of data collection compared to satellite overpass, all of which can introduce spatial and seasonal biases affecting the validation of biomass estimation models. While collecting ground-truth data is critical, it is often labor-intensive [67], resource-demanding, and logistically challenging, especially in remote and inaccessible wetland regions [58,68].
It is crucial to explicitly consider and maintain a balanced perspective on these inherent uncertainties to overcome limitations, advance methodologies, and enhance the robustness of wetland biomass estimation in similar ecological or bioeconomic studies. Refining ground-truth data collection protocols should be prioritized both before and during sampling in the field. Furthermore, implementing strategies for mitigating spatial and temporal autocorrelation during sampling and model training procedures following [69], along with leveraging data from multiple sensors—such as integrating UAV (Unmanned Aerial Vehicle) observations with satellite remote sensing [70] or combining UAV-LiDAR (Light Detection and Ranging) data with multispectral imagery [71]—as well as improving the temporal scope, will be crucial considerations for future studies. These strategies can provide biomass assessments with increased accuracy and very high spatial resolution, thereby refining ecological and bioeconomic modeling efforts.
4. Materials and Methods
4.1. Study Area
Our study focuses on the Syr Darya Delta near the northeastern shore of the former Aral Sea in the Kyzylorda Province of Kazakhstan. It is part of the Lesser Aral Sea and Delta of the Syrdarya River Ramsar Site, which the Government of Kazakhstan designated as a Wetland of International Importance in 2012 [72,73]. Within this Ramsar Site, there are two Important Bird Areas (IBAs): the Lesser Aral Sea (144,165 ha) and Syrdarya Delta Lakes (139,400 ha) (as shown in Figure 4).
The current Syr Darya Delta and its adjacent areas are a large sedimentary–alluvial plain with a continental semi-arid to arid climate and annual precipitation below 200 mm [74]. It rests within the Turanian Depression and has a typical slightly undulating lowland terrain, with higher elevation in the north and west and lower elevation in the south. According to Zinabdin [73], the delta stretches about 80 km from the Basykara Dam in the east to the coast of the Northern Aral Sea (NAS) in the west. From north to south, it reaches 120 km in length and is surrounded by sands such as the recently formed Aralkum Desert on the dry bed of the Southern Aral Sea (Great Aral Sea) in the southwest, and the vast Pre-Aral Karakum and Kyzylkum Deserts in the north and the south, respectively.
The source of the Syr Darya Delta waters is the Syr Darya River, the second-largest transboundary river in Central Asia in terms of flow volume, with a catchment area of 219,000 km^2^ [75].
In contrast to the Ili Delta at Lake Balkhash, which has remained largely undisturbed, the Syr Darya Delta at the NAS has undergone considerable changes. In the late 1960s, under the former Soviet Union, it was reclaimed as a large irrigation zone known today as Qazaly (Kazaly), resulting in the development of a network of irrigation canals of various sizes that significantly reshaped the delta’s hydrographic network. This transformation has caused the landscape to lose much of its natural “deltaic” appearance [76], as river branches were cut off from the mainstream or straightened into canals. The irrigation zone, with its area of 15,000 km^2^, has occupied most of the deltaic area and has become one of Kazakhstan’s most important rice cultivation areas, also known for having some of the northernmost rice paddies on the planet [65,77]. However, constructing numerous artificial reservoirs and canals upstream between 1950 and 2017, primarily to supply water for rice and other irrigated cultures such as cotton, has led to reduced river runoff, water shortages downstream, soil salinization, and abandoned land [78,79,80].
Despite the desiccation of the Aral Sea, the environmental and socio-economic situation in the area around the Aral Sea has improved during the last decade, particularly in its northern part. According to Micklin [81], Oskenbayeva [82], and White and Micklin [83], this positive change results from international and local efforts, including the implementation of “Syr Darya control and Northern Aral Sea” (SYNAS) projects. The reconstruction of the Kok-Aral dam and dike complex southwest of the Syr Darya Delta has helped halt the desiccation of the northern part of the Aral Sea. Furthermore, regulatory dams have been established in the delta to ensure that the whole delta, including its lakes and wetlands, now receives more water, especially during the flooding season in spring and early summer [84]. Though the water supply by the Syr Darya River to the delta varies each year, these efforts listed above have positive effects on the wetland ecosystems of the delta, benefiting, among others, the Phragmites australis stands [65].
4.2. Field Surveying and In Situ Above-Ground and Above-Water Surface Reed Biomass Sampling
This study involved two field survey campaigns in the Syr Darya Delta, with the first conducted in July and August 2019, followed by the second in September and October 2020. Ground-truth data pertaining to land cover, land use, and standing Phragmites biomass above ground and above the water surface were collected during both field survey campaigns.
Based on specific ecological and morphological characteristics of the landscape of the Syr Darya Delta, three main investigation areas were selected following an extensive analysis of high-resolution satellite imagery and a comprehensive literature review. These areas, featuring open water, marshes, and meadows characterized by significant reed vegetation, were identified in the following locations of the delta: (a) adjacent to the Kok-Aral dam and dike complex; (b) around deltaic lakes such as Aidarkol and Kotankol next to Bekarystan Bi Village; and (c) along the left branch of the Syr Darya River, next to Tasaryk and Lakaly Villages as well as around the Maryamkol Lake close to Kaukey Village.
These areas correspond reasonably well to the Seaside right-bank and left-bank Lake Systems, the Akshatau Lake System, and the Aksai-Kuandarya Lake System of the Syr Darya Delta, as delineated in the work of Kipshakbaev et al. [84]. A map illustrating these investigation areas, along with ground-truth reference points recorded using a handheld Garmin Oregon 750 Global Positioning System (GPS) device (Garmin International Inc., Olathe, KS, USA), which has a reported positioning accuracy of approximately 5 m, is presented in Figure 5.
To ensure the accurate and effective collection of reference ground-truth data, we developed a preliminary sampling scheme based on the concept of stratified random sampling prior to the field campaigns. In this scheme, we identified five primary strata corresponding to different land cover classes: open water, submerged dense reed, non-submerged dense reed vegetation, open reed areas with shrubs, and bare land with open sands (refer to Table 3 for a detailed overview). The stratification was established based on ecological relevance and the variability of habitats supporting common reed’s growth.
Areas representing land cover classes with open water and bare land without reed vegetation were used as grounds for taking reference points, while areas corresponding to land cover classes with reed vegetation were used for biomass sampling. This resulted in a dataset comprising 283 ground-truth points, with 205 related to information on land cover and land use, and 78 representing sampling plots with measured standing above-ground and above-water Phragmites biomass, as summarized in Table 4.
For biomass sampling, sites featuring homogeneous reed vegetation of at least 20 m by 20 m were selected for positioning reed biomass sampling plots to ensure conformity with the resolution of the Sentinel-2 images. These plots were marked using a GPS device, and quadrants of 1 m × 1 m size in four to five regular replicates were then spanned over each of the Phragmites australis-containing strata to measure the standing biomass of reed after [9]. Therefore, these sampling quadrants covered the range of land cover classes from dense submerged reed n = 26 and dense non-submerged reed n = 26 to sparse reed mixed with shrub vegetation n = 26 (refer to Table 1). To avoid mixed pixels or edge effects, the sampling plots were placed in the centers of large pre-selected sites with homogeneous reed vegetation.
Following [33], we calculated the stem density in each quadrat based on the living stems by counting all living and dead stems. Ten reed stems closest to the diagonal line were randomly chosen and cut 2–3 cm above the ground or water surface, like in the cases of sampling of submerged reeds accomplished by using a boat or by standing in the water in a water-proof wader, depending on the water depth and accessibility. After clipping, the leaves were separated, and stems were chopped into smaller pieces to fit into the oven for drying at 105 °C for 24 h and weighing afterward to determine the dry biomass weight.
The above-ground and above-water surface stand biomass was then calculated by multiplying the average dry weight of one plant—derived from the dry weight of the ten sampled plants—with the previously counted stem density in each sampling quadrant. The resulting yield values were then converted to yields per area of one hectare.
Upon completion of the collection and pre-processing of data, descriptive statistical analysis was performed on Phragmites australis’s measured above-ground and above-water surface biomass data.
4.3. Satellite Data Acquisition and Spectral Index Calculation
In our study, we used multispectral satellite data from Sentinel-2A and -2B Level-2A, which are part of the Copernicus program by the European Space Agency (ESA). Developed for terrestrial research applications, these satellites provide open-access high-resolution imagery with frequent (3–5 days) revisit times, which makes them particularly valuable for monitoring dynamic ecosystems such as wetlands.
To cover the Syr Darya Delta, we needed four separate Sentinel-2 tiles of 110 × 110 km^2^ each (with footprints IDs: T41TLM, T41TMM, T41TLL, and T41TML). These tiles were used to generate a mosaic of input raster files for each band for every date.
In total, we obtained 24 cloud-free images from the ESA Copernicus Open Access Hub website (the old address: https://scihub.copernicus.eu/; accessed on 18 February 2020 the new address: https://dataspace.copernicus.eu/, accessed on 21 August 2023 ), covering the dates of our field surveying and Phragmites biomass sampling in 2019 and 2020. These spaceborne products are provided with orthorectified Bottom-of-Atmosphere (BOA) reflectance, i.e., atmospherically corrected surface reflectance, and ready-to-use imagery in cartographic geometry spanning 13 spectral bands with spatial resolution ranging from 10 to 60 m, as shown in Table 5.
For our modeling, four bands (blue—Band 2; green—Band 3; red—Band 4; and NIR—Band 8) with a spatial resolution of 10 m × 10 m were extracted from each imagery tile and merged into a single mosaic (see Appendix B containing R codes sample as supplementary material for understanding and reproducing the satellite data processing and preparation predictor variables for the RF modeling). These bands were then used to compute two remote sensing indices: NDVI (Equation (1)) and NDWI (Equation (2)). These indices and the four bands have been selected as predictors for reed biomass modeling because they have proven to be suitable predictors for biomass in previous studies, such as Thevs et al. [24], Mogano [85], and Tiškus et al. [86].
We calculated the indices using the following equations:
where ρNir is the pixel value in the near-infrared band, ρRed is the pixel value in the red band, and ρGreen is the pixel value in the green band.
In order to exclude the pixels representing settlements and croplands from the modeling process, these areas were identified and manually digitized using the most recent digital cadaster data available on the official state-initiated web portal Qoldau.kz, which monitors agricultural land usage on the territory of Kazakhstan [87,88]. Subsequently, these areas were masked out from the input raster dataset.
These digitization procedures, as well as imagery processing, mosaicking, indices calculations, and other data curations to build prediction models, were carried out using R Studio 2023.06.1 Build 524, QGIS 3.16.14, and ArcGIS Pro 3.2.2.
4.4. Overview of the Data and the Study Workflow
Two types of data were used to conduct this study: reference ground-truth records and high-resolution Sentinel-2 satellite images, which were then combined to assemble the input datasets for the Random Forest (RF) regression modeling, as shown in Figure 6.
In the first step, 78 reed biomass and 205 ground-truth points on the land cover were collected with a stratified sampling technique during field surveying.
In the second step, model input datasets were thoroughly prepared by conflating the dry biomass weight of sampled reed along with the collected points on the land cover as a response variable dataset, and a raster batch of Sentinel-2 spectral channels together with spectral indices compiled for the study area extent as a predictor variables (or “features”) dataset. While the spectral channels included blue (B2), green (B3), red (B4), and near-infrared (B8), the computed indices were the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Water Index (NDWI).
In the third step, these two input datasets were combined in a machine learning-based regression approach to build and train Random Forest regression models. Predictive reed biomass mapping in the wetlands of the Syr Darya Delta was then carried out using the model with the highest testing accuracy, which was determined through feature selection and hyperparameter tuning after iterative learning of the input variable relationships.
In the fourth step, the predictive performance of the Random Forest regression models, as well as the uncertainty assessment and the quality of the generated map, was evaluated using a ten-fold cross-validation technique.
4.5. The Random Forest Modeling, Application, and Performance Assessment
To model a spatially explicit yield map for Phragmites australis-dominated wetlands and quantify the area and standing biomass of reed beds in the Syr Darya Delta, Kazakhstan we employed multiple Random Forest regression models, leveraging the strengths of one of the most effective tree-based machine learning algorithms. The selection of Random Forest was driven by its robust modeling capabilities, proficiency in processing complex datasets, and generalization abilities, making it especially an effective method widely used to estimate above-ground biomass under a wide range of biophysical contexts to this day.
Unlike traditional parametric methods, Random Forest does not rely on distribution assumptions of the relationship between the predictors and the response variable [89,90]. This makes it capable of handling outliers and managing noisy and highly correlated predictor variables, thus providing accurate predictions of above-ground biomass with variable importance measures and unbiased error estimates [31]. Additionally, it effectively copes with skewed data distributions, such as in the case of our study.
The algorithm constructs numerous independent decision trees (ntree) without pruning using a randomly selected two-thirds training sample from the original dataset by the bootstrapping (multiple random sampling with replacement) method controlled with the node size set by the user [91]. It then tests a random subset of predictor variables at each node in the trees to identify the most efficient split. The final prediction is derived by aggregating and averaging the prediction votes of all the individual trees. The remaining one-third of the original dataset, known as out-of-bag data, stays unseen by the models as an internal validation sample for testing the models’ behavior beyond the data from which they were built. It is then used to assess both model outputs, the mean square error, and node purity.
For our analysis, we built 48 Random Forest above-ground biomass regression models (see Appendix A for more details) using the “rf” function within the “caret” R environment software package v.6.0.94 (see Appendix C containing R codes sample as supplementary material for understanding and reproducing the Random Forest regression modeling) and the Forest-Based Classification and the Regression Tool in ArcGIS Pro 3.2.2. The continuous values of above-ground and above-water surface reed biomass data led us to execute the Random Forest regression models after examining the relationships between reed biomass data and each remote sensing variable (in Section 2.1.).
The optimal parameters were determined with ntree set to 1000 and mtry calculated as two of the total number of variables and a minimal node size of five. This configuration generated the minimized out-of-bag Mean Square Error (MSE) and facilitated the estimation of above-ground and above-water surface biomass of reed on the Syr Darya Delta wetlands using Sentinel-2 images and reed biomass data. To avoid overfitting the models and evaluate their performance and accuracy more precisely, the k-fold cross-validation technique was implemented by setting up the train control for ten folds. This technique involved dividing the data into ten folds or train/test sets so that each point in the dataset occurred exactly once in one of the ten test sets. This allowed us to create a test set that is the same size as the training set but is composed of out-of-sample predictions. Each fold is then randomly assigned to its single test set, avoiding systematic biases in the data.
The assessment of the regression model fit included the computation of the percentage of variance explained by the model, the coefficient of determination (R^2^) (Equation (3)), and the root mean square error (RMSE) (Equation (4)) according to the following equations:
where is an actual field-measured above-ground and above-water surface biomass value in the ith sample, is a predicted above-ground and above-water surface biomass value and represents the mean simulated estimated above-ground and above-water surface biomass for all tested sample points, and indicates the size of the samples in the different datasets.
5. Conclusions
This research work aimed to characterize the resource potential of common reed (Phragmites australis) in the Syr Darya Delta, Kazakhstan, through a comprehensive modeling approach utilizing high-resolution Sentinel-2 satellite imagery, spectral indices, and ground-truth above-ground and above-water reed biomass measurements, all applied within multiple Random Forest regression frameworks. The predictive models exhibited high performance, attaining R^2^ values of 0.93 (RMSE = 2.74 t ha^−1^) during calibration and R^2^ = 0.83 (RMSE = 3.71 t ha^−1^) in validation. Models using the Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) as key predictors generated precise and spatially explicit yield maps of Phragmites australis-dominated wetlands, distinctly outlining the standing biomass of common reed across different types of stands and growing sites. This methodological framework enabled accurate and cost-effective quantification of the reed’s spatial extent and biomass resources.
In the analysis, the total above-ground and above-water biomass of Phragmites australis in the study area for 2019–2020 was estimated to be approximately 2.5 million tons. The area of dense reed beds, both submerged and non-submerged, with standing biomass exceeding 10.5 tons per hectare, was calculated to cover 58,935 hectares, representing an estimated biomass of 1,240,789 tons. This underscores the substantial biomass potential inherent in the region’s wetlands.
The practical implications of this research extend beyond the confines of the Syr Darya Delta in Kazakhstan, offering a scalable methodology that can be applied to assess similar wetland ecosystems worldwide. However, while integrating Random Forest regression modeling with high-resolution Sentinel-2 data and in situ biomass samples has proven effective in mapping and quantifying above-ground and above-water biomass in expansive, remote, and dynamic regions like the Syr Darya Delta, characterized by variable hydrological conditions and limited data access, there remains a significant limitation in capturing seasonal dynamics and understanding inter-annual variability in reed biomass due to the short temporal scope of this work.
Future investigations with larger temporal datasets are thus essential for long-term biomass monitoring and understanding their trends. By highlighting these, this study draws attention to further research in tracking biomass dynamics over extended periods and formulating comprehensive and sustainable strategies for managing the distribution and utilization of reed beds, as well as integrating these frequently neglected biomass hotspots into landscape planning initiatives.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Huston M.A. Wolverton S. The Global Distribution of Net Primary Production: Resolving the Paradox Ecol. Monogr.20097934337710.1890/08-0588.1 · doi ↗
- 2Convention on Wetlands Global Wetland Outlook: Special Edition 2021 Secretariat of the Convention on Wetlands Gland, Switzerland 202156
- 3Zelnik I. Germ M. Diversity of Inland Wetlands: Important Roles in Mitigation of Human Impacts Diversity 202315105010.3390/d 15101050 · doi ↗
- 4Troia A. Macrophytes in Inland Waters: From Knowledge to Management Plants 20231258210.3390/plants 1203058236771666 PMC 9921133 · doi ↗ · pubmed ↗
- 5Murphy K. Efremov A. Davidson T.A. Molina-Navarro E. Fidanza K. Betiol T.C.C. Chambers P. Grimaldo J.T. Martins S.V. Springuel I. World Distribution, Diversity and Endemism of Aquatic Macrophytes Aquat. Bot.201915810312710.1016/j.aquabot.2019.06.006 · doi ↗
- 6Lesiv M.S. Polishchuk A.I. Antonyak H.L. Aquatic Macrophytes: Ecological Features and Functions Stud. Biol.202014799410.30970/sbi.1402.619 · doi ↗
- 7Thomaz S.M. Ecosystem Services Provided by Freshwater Macrophytes Hydrobiologia 20238502757277710.1007/s 10750-021-04739-y · doi ↗
- 8Srivastava J. Kalra S.J.S. Naraian R. Environmental Perspectives of Phragmites australis (Cav.) Trin. Ex. Steudel Appl. Water Sci.2014419320210.1007/s 13201-013-0142-x · doi ↗
