Microclimates, land cover, and socioeconomic vulnerability shape Anopheles hotspots in Maryland, USA
Chibuike Chiedozie Ibebuchi, Somtochukwu Stella Onwah, Itohan-Osa Abu

TL;DR
This study identifies where Anopheles mosquitoes are most active in Maryland and how factors like habitat, climate, and wealth influence their presence.
Contribution
The study introduces a fine-scale analysis combining environmental and socioeconomic factors to map Anopheles hotspots in Maryland.
Findings
Prince George’s and Anne Arundel Counties are primary hotspots for Anopheles mosquitoes.
Woody wetlands, low impervious surfaces, and humid microclimates are key predictors of mosquito presence.
Affluent suburban areas with favorable habitats show higher Anopheles activity despite lower socioeconomic vulnerability.
Abstract
Anopheles mosquitoes pose notable public health concerns as competent vectors of malaria and other diseases. Although malaria is no longer endemic in the United States, recent locally acquired cases in states including Maryland highlight the need to better understand Anopheles dynamics in the region. This study aimed to identify geographic hotspots of Anopheles presence in Maryland and evaluate how land cover, microclimatic conditions, and socioeconomic vulnerability shape their spatial and temporal distribution. Monthly Anopheles occurrence data (1999–2024) from Global Biodiversity Information Facility (GBIF) were aggregated at county and Census Block Group (CBG) scales. Counties were ranked by mean annual presence to identify hotspots. Associations with land cover, microclimatic, and socioeconomic conditions were assessed using Spearman’s rank correlation (ρ; P < 0.05). At the CBG…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMalaria Research and Control · Mosquito-borne diseases and control · Species Distribution and Climate Change
Background
Mosquitoes are among the most medically significant arthropods worldwide, responsible for transmitting pathogens that cause diseases such as malaria, West Nile fever, Zika disease, chikungunya, and various forms of encephalitis [1–4]. The public health burden is substantial: globally, mosquito-borne diseases cause hundreds of millions of infections and over 700,000 deaths each year [5, 6]. While malaria has historically been concentrated in tropical regions, changes in climate, land use, and human mobility have expanded the geographic ranges of several mosquito species, raising concerns even in temperate regions where these vectors were previously less predominant [7, 8]. In the United States (US), although malaria transmission is rare compared to endemic countries, locally acquired cases still occur, and arboviral diseases such as West Nile fever are persistent seasonal threats [9, 10].
Anopheles mosquitoes, while historically associated with malaria transmission, also contribute to the ecology of other pathogens [11]. In the US, several mosquito species are established across the Mid-Atlantic region, including Maryland, where environmental and socioeconomic conditions can contribute to localized hotspots of vector activity [12–14]. Maryland’s diverse landscape, ranging from coastal plains to urban centers, combined with its variable climate and dense human populations, creates heterogeneous habitats for mosquito breeding [14, 15]. Moreover, seasonal patterns of temperature, precipitation, and humidity influence mosquito life cycles and presence [16].
Vector surveillance in Maryland is conducted through state and local mosquito control programs, which monitor arboviral pathogen presence in mosquito pools to inform spraying and source reduction. However, routine monitoring often focuses on areas with historically high complaints or vector-borne disease activity, which may overlook other emerging at-risk communities [17–19]. Traditional entomological surveys, while essential, are costly, spatially limited, and often lack integration with socioeconomic vulnerability assessments [18, 19].
Existing research on mosquito ecology in the Mid-Atlantic has largely concentrated on climatic drivers, such as the influence of temperature and precipitation on seasonal mosquito presence [20]. Warmer temperatures can extend breeding seasons, while precipitation affects the availability of larval habitats [16]. Urban heat islands and high levels of impervious surface cover have also been linked to increased mosquito presence in some Aedes species, though these relationships can vary by genus and species [21]. In contrast, comparatively less attention has been paid to applying nonlinear explainable machine learning (ML) models to examine how socioeconomic factors interact with environmental conditions to shape mosquito presence at finer neighborhood scales in the Mid-Atlantic.
According to the CDC [22] and Maryland Department of Health [23], the US reports approximately 2000 imported malaria cases annually, including around 200 in Maryland. In 2023, Maryland confirmed one locally acquired case of Plasmodium falciparum, a parasite commonly transmitted by female Anopheles mosquitoes. Additionally, there is research on environmental and socioeconomic drivers of mosquito ecology in the state, particularly for Aedes [13, 14, 18]. However, up-to-date studies that quantify Anopheles risk at county or Census Block Group (CBG) scales in Maryland remain limited or absent.
Most studies of mosquito presence in the Mid-Atlantic have focused on Culex species, given their role in transmitting West Nile virus and other arboviruses [10, 24]. In contrast, Anopheles mosquitoes have received less attention in recent decades, partly because malaria has been eliminated as an endemic disease in the US [25]. However, recent reports of locally acquired malaria cases in states such as Maryland, Florida, Texas, and Arkansas indicate the need to better understand the dynamics of Anopheles mosquitoes, even in regions where malaria is no longer a major public health concern [26, 27]. This study provides one of the few county- and sub-county-level analyses of Anopheles presence in Maryland, combining long-term occurrence data with environmental and socioeconomic predictors.
We integrate occurrence records from Global Biodiversity Information Facility (GBIF) to address the following objectives: (1) identify geographic hotspots of Anopheles presence across Maryland at both county and CBG level; and (2) quantify how local environmental, climatic and socioeconomic factors shape mosquito presence. This multi-scale analytical framework combines a broad statewide perspective with fine-grained local analysis at the neighborhood scale, offering actionable insights for targeted vector control strategies and optimized resource allocation.
Methods
Our analysis integrates multiple datasets capturing mosquito occurrence, environmental conditions, and socioeconomic characteristics across Maryland, US (Fig. 1). These datasets span different spatial and temporal resolutions (Table 1), enabling both county and CBG level investigations. CBGs are small U.S. Census Bureau statistical units, each containing roughly 600–3000 people, offering finer spatial resolution than counties. This granularity enables detection of neighborhood scale environmental and socioeconomic variation that may be masked in county-level averages, improving the ability to identify and analyze localized Anopheles hotspots.Fig. 1. Geographical location of Maryland in the US (in red). The pinch-out image highlights the counties in Maryland. Map was created by authors using QGISTable 1List of datasets used in this study and their relevance to mosquito spatial presenceDatasetSourceYear/VersionRelevance to mosquito spatial presenceMosquito occurrence point recordsGBIF.org. https://www.gbif.org/Jan 1999–Dec 2024Provides georeferenced presence records of Anopheles mosquitoes from multiple sources (museum collections, academic surveys, state monitoring), used to quantify spatial and temporal patterns at county and CBG scalesCBG-level socioeconomic indicators (ADI)University of Wisconsin School of Medicine and Public Health, Area Deprivation Index v4.0.1 [31]. https://www.neighborhoodatlas.medicine.wisc.edu/2023Fine-scale (CBG) composite ranking of socioeconomic disadvantage, enabling within-county mapping of areas potentially more vulnerable to mosquito-borne diseaseCounty-level socioeconomic indicators (SVI)CDC/ATSDR Social Vulnerability Index. https://svi.cdc.gov/dataDownloads/data-download.html2022Captures social and economic conditions (e.g., poverty, unemployment, housing, transportation) influencing neighborhood maintenance, exposure risk, and capacity for mosquito control at the county scaleClimate variablesPRISM Climate Group, 30-Year Normals (1991–2020) at 800 m spatial resolution https://prism.oregonstate.edu/normals/1991–2020 climatologyMinimum/maximum air temperature, mean precipitation, and vapor pressure deficit characterize climatic conditions affecting mosquito development, survival, and activityElevationUSGS 10 m Digital Elevation Model, National Map Downloader. https://www.usgs.gov/the-national-map-data-delivery/gis-data-downloadLatest available, as of 2025Influences temperature, moisture retention, and hydrological patterns relevant to mosquito habitat suitabilityPercent impervious surface30 m USGS National Land Cover Database (NLCD), MRLC. https://www.usgs.gov/centers/eros/science/annual-national-land-cover-database2024Indicates the proportion of land covered by impermeable materials, which can influence water pooling and breeding site availability and land cover categoriesGBIF Global Biodiversity Information Facility, SVI Social Vulnerability Index, CDC Centers for Disease Control and Prevention, ATSDR Agency for Toxic Substances and Disease Registry, CBG Census Block Group, ADI Area Deprivation Index, PRISM Parameter-elevation Regressions on Independent Slopes Model, USGS United States Geological Survey, DEM Digital Elevation Model, NLCD National Land Cover Database, MRLC Multi-Resolution Land Characteristics Consortium
Mosquito occurrence data
We obtained Anopheles mosquito presence records (i.e. including An. punctipennis, An. crucians, and An. quadrimaculatus) for Maryland from GBIF [28], covering January 1999 to December 2024 – when non-zero presence records were predominantly recorded. GBIF compiles occurrence georeferenced records from multiple sources, including museum collections, academic surveys, and state monitoring programs. These Anopheles species were selected due to their relative presence in Maryland and competency as historical vectors of diseases such as malaria and lymphatic filariasis, as well as potential to transmit other diseases [29, 30]. Only records with valid geographic coordinates within Maryland were retained. Each record was assigned to its corresponding county and CBG using the U.S. Census TIGER/Line shapefiles 2024. The dataset was aggregated per county and per CBG to counts of presence observations, enabling characterization of trends, spatial relationships and seasonal cycles.
Data limitations
GBIF occurrence records are presence-only and opportunistic, and therefore subject to non-random sampling bias (e.g., spatial clustering near populated or accessible areas, variable observer effort across space and time, and reporting gaps). Absences cannot be interpreted as true absences. To reduce effort-driven artifacts in the fine-scale analysis, we limited CBG-level modeling to hotspot counties with relatively sufficient observer effort—defined as ≥ 30 unique observer-days over the study window, with coverage distributed across the warm season in multiple years. Nonetheless, residual biases, including pandemic-period disruptions to surveillance and reporting, may remain and are considered in the interpretation of result.
Socioeconomic data
At the county level (Fig. 1), we compiled socioeconomic indicators from the Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry [32]. The SVI is designed to assess relative vulnerability across counties nationally, making it well-suited for inter-county comparisons. We selected measures likely to influence mosquito presence through their effects on neighborhood maintenance, exposure, and prevention capacity. These included (1) overall socioeconomic status (combining poverty, unemployment, income, and education), (2) housing and transportation vulnerability (capturing crowding, multi-unit housing, mobile homes, lack of vehicles, and group quarters); as well as individual components such as (3) the percentage of residents unemployed, or (4) without a high school diploma. Furthermore, housing-related metrics included (5) the share of crowded households, (6) proportion of multi-unit structures, and (7) percentage of mobile homes, all of which can be associated with conditions favorable for mosquito breeding, such as shared infrastructure, limited drainage, and container accumulation. We also considered (8) lack of vehicle access, which may reflect broader neighborhood disinvestment and reduced access to prevention resources.
At the CBG level, we used the 2023 Area Deprivation Index (ADI) from the University of Wisconsin School of Medicine and Public Health [31]. ADI provides a composite measure of socioeconomic disadvantage based on 17 census-derived indicators such as income, education, employment, and housing quality. We utilized ADI state decile (1 = least deprived; 10 = most deprived), which ranks CBGs within Maryland. Because ADI is reported at the CBG scale, it supports within-county analyses and fine-scale mapping of socioeconomic conditions. Higher state deciles indicate greater deprivation and, by extension, potentially greater vulnerability to environmental health risks.
Environmental data
Environmental predictors at the CBG scale were chosen based on established associations between climate, habitat characteristics, and mosquito presence. Climate variables were obtained from the PRISM30-Year Normals, representing the 1991–2020 climatology [33]. This included minimum and maximum air temperature, mean precipitation, and minimum and maximum vapor pressure deficit, the latter serving as an indicator of atmospheric moisture and humidity conditions. CBG-level aggregation of PRISM data with 800 m resolution is aimed to capture microclimatic patterns sufficient for ecological, public health, or urban planning applications that require sub-county climate granularity.
Elevation was sourced from the U.S. Geological Survey (USGS) 10 m Digital Elevation Model through the National Map Downloader (Table 1), while percent impervious surface (and land cover categories), representing the proportion of land covered by materials such as asphalt or concrete, was obtained from the National Land Cover Database (NLCD) through the Multi-Resolution Land Characteristics Consortium [34].
In terms of data harmonization, except for SVI (county), GBIF presence data (point records) and ADI (CBG), all layers were reprojected to a common projected Coordinate Reference System for Maryland (US Contiguous Albers Equal Area, EPSG:5070), resampled as needed, and summarized to the analysis units (area-weighted means/percents within counties or CBGs).
Methods
Study design and analytical framework
We employed a two-scale analytical approach. The first stage was a county-level analysis to identify Anopheles mosquito presence hotspots with sufficient observer efforts and evaluate associations between presence and county-level socioeconomic indicators. The second stage focused on CBG-level analysis within hotspot counties to examine fine-scale drivers using an explainable AI (XAI) framework. At both CBG and county-level, hazard score was derived by taking the mean annual presence per county/CBG, applying a log transformation to reduce skewness, and normalizing values between 0 and 1. Higher hazard scores indicate counties/CBGs with higher recorded Anopheles presence.
County-level analysis
At the county scale, total monthly Anopheles presence counts from January 1999 to December 2024 were aggregated and converted to mean annual values. Counties were ranked by mean annual presence to identify hotspots. For exploratory analysis, associations between mosquito presence and socioeconomic variables were evaluated using Spearman’s rank correlation (ρ), which captures monotonic relationships without assuming normality, alongside Pearson’s correlation for assessing linear associations. Statistical significance was evaluated at the 95% confidence level (P < 0.05).
CBG-level correlation analysis and variable screening
At the CBG scale, we examined environmental predictors and the ADI to assess whether socioeconomic disadvantage and habitat characteristics were associated with mosquito presence. Spearman’s ρ and P-values were computed, and only variables with statistically significant monotonic associations (P < 0.05; |ρ|≥ 0.25) were retained for XAI modeling to support a parsimonious and interpretable framework. This preliminary screening step ensured that predictors with no detectable relationship to Anopheles presence were excluded before model fitting, thereby reducing model complexity and computational overhead. SHAP values were then applied to the retained predictors to provide an objective, model-based ranking of their relative importance.
Explainable machine learning modeling and spatial blocking
To quantify the relative importance of spatial predictors, an Extreme Gradient Boosting regressor (XGBRegressor) [35] was fit to CBG-level Anopheles occurrence counts using the retained environmental, socioeconomic and land-cover covariates from the correlation analysis. We used gradient-boosted trees [35] over count-based models (e.g., Poisson, negative binomial) that assume specific distributional forms, to flexibly capture nonlinearities and interactions among predictors. Their non-parametric nature allows modeling of correlated covariates without strict assumptions. Spatial autocorrelation was controlled with blocked resampling. Within each county, CBG centroids were projected and partitioned into a 4 × 4 quantile grid to define spatial blocks, and GroupKFold held out entire blocks as test folds. Hyperparameters were selected with grouped five-fold cross-validation in GridSearchCV on these same blocks over a stability-oriented grid. This grid included max_depth ∈ {2,3,4}, n_estimators ∈ {400,600,800}, learning_rate ∈ {0.03,0.05,0.07}, subsample ∈ {0.8,0.9}, colsample_bytree ∈ {0.8,0.9}, and reg_lambda ∈ {0.5,1.0,2.0}, optimizing out-of-block R^2^ and reducing risk of overfitting. Models in each fold used the squared-error objective on the raw counts, and performance was summarized on the pooled out-of-block predictions across folds. For interpretability, SHAP (TreeExplainer) values were computed [36]. For each split, SHAP values were obtained only on the held-out blocks and then aggregated across folds to derive global importance (mean |SHAP|) and local effect patterns (beeswarm). This protocol limits information leakage from neighboring units, respects spatial dependence, and ensures all CBGs contribute—either to model fitting or to out-of-block explanation.
Data analysis
Data analysis was conducted in Python version 3.10 (Python Software Foundation, US: https://www.python.org/). Data processing and statistical computations were performed using Python packages including pandas, numpy, and scipy. Similaryl, XAI modeling and cross-validation were implemented using scikit-learn, and model interpretability was assessed using the shap package. Mapping and visualization were carried out using QGIS, matplotlib and Cartopy.
All statistical tests, including the non-parametric Spearman correlation, were evaluated at a significance threshold of P < 0.05. Because our key analytical workflow relies on distribution-free methods—including Spearman’s ρ, gradient-boosted tree models, and SHAP explanations—none of the procedures used in this study required assumptions of normality.
Results
County-level analysis of Anopheles presence
Figure 2 shows the spatial distribution of Anopheles mosquito presence across Maryland counties from 1999 to 2024. Two major hotspots (with sufficient observer effort) emerged—Prince George’s County (hazard score = 1.0) and Anne Arundel County (0.775)—with substantially higher Anopheles presence (7468 and 999, respectively) compared to other counties. Secondary clusters of elevated presence were found in Baltimore, Dorchester, and Carroll counties. Many western and lower eastern shore counties showed low or no recorded Anopheles presence, reflected in hazard scores near zero. This spatial pattern guided the selection of hotspot counties for subsequent fine-scale CBG-level analysis.Fig. 2. Spatial distribution of Anopheles mosquito presence across Maryland counties from 1999 to 2024. The striped counties indicate areas with no recorded presence during the study period
Figure 3 shows the linear spatial relationships between county-level socioeconomic factors and Anopheles mosquito presence across Maryland counties. At the 95% confidence level, crowded housing (r = 0.70), no high school diploma (r = 0.65), unemployment (r = 0.64), and multi-unit structures (r = 0.46) showed statistically significant positive correlations. Other variables, including no vehicle access, housing and transportation, socioeconomic status, and mobile homes, exhibited weaker associations that were not statistically significant at a 95% confidence level. However, the correlations in Fig. 3 failed test of statistical significance at a 95% confidence level using Spearman correlations that is relatively more robust to outliers.Fig. 3. Correlation between socioeconomic factors and Anopheles mosquito presence count in Maryland Counties. Asterisks show significant correlations at 95% confidence level
Next, based on Fig. 2, we focus on the two hotspot counties—Prince George’s and Anne Arundel—for CBG-level analysis. As shown in Fig. 4, mosquito presence in both counties exhibits a clear seasonal cycle, beginning to rise in May, peaking between July and August, and declining from October onwards. At the annual scale, Prince George’s County shows an overall increasing trend in mosquito presence from 2001 to 2020, followed by a marked drop from 2021. Similarly, Anne Arundel County shows a general increase from 2004 to 2019, with a sharp decline from 2020. These sharp declines likely reflect multiple factors, including potential control measures and disruptions in monitoring and reporting.Fig. 4. Annual cycle calculated as monthly sum of mosquito occurrence (top) and annual sum of mosquito occurrence (bottom) for the Prince George’s and Anne Arundel County (1999–2024). Zero presence was recorded in Prince George’s in 2021
Neighborhood-scale level analysis of Anopheles presence
At the CBG scale, approximately 41% of block groups in Prince George’s County and only about 2% in Anne Arundel County have hazard scores greater than zero, indicating a wider spatial extent of reported mosquito presence in Prince George’s compared to other Maryland counties (Fig. 5a and b). The highest hazard scores are concentrated in the northeastern part of Prince George’s County and the northwestern part of Anne Arundel County.Fig. 5. Hazard score of Anopheles presence at the census block group scale in Prince George’s Anne Arundel Counties (a) and across census block groups in the US for the three Anopheles species analysed
Across the three main Anopheles species observed—An. punctipennis, An. crucians, and An. quadrimaculatus—the spatial distribution of hazard scores is fairly consistent (Fig. 5b). All three species show concentration of high hazard scores in the same northeastern Prince George’s and northwestern Anne Arundel hotspots. This consistency likely reflects shared ecological preferences among these species, including similar breeding habitat requirements and sensitivity to local climatic and land cover conditions, which potentially drive their co-occurrence at nearly the same hotspot locations.
Correlation analysis and explainable machine learning modeling of neighborhood-scale drivers of Anopheles presence
Focusing on Prince George’s County, selected due to its large number of CBGs with nonzero hazard scores, Fig. 6 highlights environmental and socioeconomic variables with statistically significant correlations (|ρ| ≥ 0.25,* P* < 0.05) with Anopheles presence. These include the ADI, minimum vapor pressure deficit, minimum air temperature, percentage of impervious surface, mean precipitation, and elevation.Fig. 6. Spatial correlation at census block groups in Prince George’s County, between environmental factors, ADI and mosquito presence. Asterisk (*) shows statistical significance at 95% confidence level
The ADI shows the strongest correlation (ρ = –0.35), indicating that unlike the non-robust county level results, at the neighborhood scale, block groups with less socioeconomic deprivation tend to have environmental conditions that favor mosquito presence. Therefore, while county-level analysis linked socioeconomic vulnerability to mosquito burden, the more reliable CBG-level analysis (robust to outliers and with relatively much higher spatial granularity: n = 24 counties vs 4091 CBG) within Prince George’s County revealed higher mosquito presence in less deprived neighborhoods. Unlike container-breeding mosquitoes (e.g., Aedes for Zika/West Nile viruses), which often proliferate in low socioeconomic status areas with abandoned properties or poor drainage [37], Anopheles need natural, cleaner aquatic habitats [38–40]. This pattern likely reflects fine-scale ecological drivers—such as well-maintained woody wetland habitat, low impervious surface, and humid microclimates—more prevalent in affluent residential zones.
Minimum vapor pressure deficit (ρ = –0.31) and minimum air temperature (ρ = –0.29) are also negatively correlated, suggesting that locations with lower values for these variables—potentially indicative of cooler and more humid microclimates—are associated with hotspots of Anopheles presence. This relationship will be further explored in the discussion section, particularly in relation to land cover and hydrological features. Impervious surface percentage (ρ = –0.27) is similarly negative, reflecting the limited breeding habitat in heavily urbanized, paved environments. This is further supported by the correlations between Anopheles presence and land cover fractions. Woody Wetlands Fraction achieves the highest positive correlations (ρ = 0.33) indicative of the habitat preference of the mosquito. This is followed by Herbaceous Wetlands (ρ = 0.27) and open water (ρ = 0.27). Further, from Fig. 6, mean precipitation (ρ = 0.26) is positively correlated, consistent with the role of water accumulation in supporting mosquito breeding. Finally, elevation (ρ = –0.26) is negatively correlated, suggesting that lower-lying neighborhoods—more prone to water pooling—tend to have hotspots of Anopheles presence.
Figure 7a and b show impact of the predictors in Fig. 6 on mosquito presence and the relative contribution of the predictors using SHAP. Impervious surface percentage emerges as the most influential predictor, accounting for 33.1% of total model impact, followed by minimum vapor pressure deficit (24.6%) and elevation (18.2%). Therefore, while the XAI model (Fig. 7) identified impervious surface as the top predictor, and correlation analysis identified woody wetlands (ρ = 0.33) as the top land cover type. Minimum air temperature ranks fourth (12.0%), consistent with its negative correlation in Fig. 7, while mean precipitation (8.0%) and ADI (4.2%) have smaller, though still meaningful, contributions.Fig. 7. Beeswarm plot from XGBoost-SHAP modeling showing how each predictor impacts Anopheles mosquito presence (a) and the ranking of each predictor’s contribution on the model’s explanatory power (b)
Together, these results in Fig. 7b highlight that land surface characteristics, microclimatic moisture conditions, and socioeconomic context are notable drivers of Anopheles presence hotspots in the analysed county. The SHAP plots in Fig. 7a further confirm the direction of these relationships reinforcing the correlation analysis in Fig. 6: lower minimum vapor pressure deficit (more humid conditions), lower minimum air temperature (cooler microclimates), lower elevation, and lower impervious surface percentages are generally associated with higher predicted Anopheles hazard scores. Conversely, higher mean precipitation is linked with increased hotspots of Anopheles presence, consistent with its role in creating and sustaining breeding habitats. This alignment between statistical correlations and SHAP-based feature effects strengthens confidence in the stability of these predictors and their ecological plausibility.
Discussion
At the county scale, Prince George’s and Anne Arundel Counties emerged as primary hotspots of Anopheles presence with high observer effort. Socioeconomic indicators—unemployment, prevalence of multi-unit housing, lack of a high school diploma, and household crowding—were positively correlated with county-level Anopheles presence (P < 0.05) using Pearson correlations. However, given the small county sample (n = 24), potential dominance by high-occurrence counties, and the loss of fine-scale spatial detail, these associations (which are non-significant using Spearman correlations) may reflect spatial aggregation and risk an ecological fallacy. Consistent with this concern, our CBG-level analysis within Prince George’s County showed a negative correlation between Area Deprivation Index and Anopheles presence (ρ = − 0.35). This scale-dependent contrast highlights the need for cautious interpretation, considering sampling bias, observer effort, and aggregation effects.
The CBG-scale analysis revealed heterogeneity of recorded presence within counties. Prince George’s County stood out, with approximately 41% of its block groups showing nonzero hazard scores. This spatial pattern indicates the importance of fine-resolution analyses for detecting localized hotspots that may be masked in coarser-scale assessments.
A key insight from our analysis is the role of land cover and microclimatic moisture conditions. At the CBG, low impervious surface percentage (33.1% of total SHAP) and minimum vapor pressure deficit (VPDmin, 24.6% of total impact) emerged as important predictors of Anopheles presence hotspots. This was followed by elevation, and minimum air temperature.
The strong positive correlation between VPDmin and minimum air temperature (ρ = 0.98) suggests that lower nighttime temperatures during warm months often coincide with more humid microenvironments. This is further supported by Table S1, which shows strong negative correlations between VPDmin and forested or agricultural land use types: land covers that typically provide shade, reduce local temperatures, and help retain atmospheric moisture. In contrast, the positive correlation between VPDmin and impervious surface fractions in developed areas supports the idea that urban heat retention and reduced vegetation contribute to higher VPD and atmospheric drying, consistent with the urban heat island effect (Table S1).
As shown in Table S1, these moisture-rich microclimates—often associated with shaded wetlands, floodplains, or vegetated lowlands—provide the stable humidity and breeding habitats required for Anopheles survival and reproduction [41]. For our area of assessment, woody wetland, typically associated with saturated soils that provide breeding grounds for larvae, relatively lower temperatures, dense vegetation, and high humidity emerged as potentially the most preferred habitat for Anopheles mosquitoes. Moreover, a laboratory study by Kessler and Guerin [42] have demonstrated that Anopheles adults preferentially seek cooler, more humid microhabitats when deprived of water or sugar, using thermohygroreceptor cells to detect favorable conditions. In these settings, lower saturation deficits reduce metabolic stress and improve survival prospects. This behavioral adaptation is consistent with our finding at the CGB-scale that lower minimum temperatures, which often signal more humid microenvironments, are associated with higher hotspots of Anopheles presence in Prince George’s County. The concordance between our field-based analysis and controlled laboratory experiments provides strong biological plausibility for the observed relationship between microclimatic humidity and Anopheles hazard.
These findings help reconcile the seemingly counterintuitive negative correlation between hazard scores and minimum temperatures. While high daytime temperatures during summer (June to August, Fig. 4) promote rapid mosquito development, cooler nighttime minima in certain landscapes may signal the presence of nearby water bodies, wetlands, or dense vegetation that retain humidity, creating microhabitats favorable for Anopheles survival and reproduction [41]. Such moisture-rich environments buffer mosquitoes from desiccation stress and can extend adult lifespan [42]. Our SHAP analysis supports this interpretation, revealing that both humidity proxies (VPDmin) and habitat features (elevation, impervious surface cover) exert strong influence on hotspots of Anopheles presence at neighborhood scales, highlighting the interplay between microclimate and landscape structure in shaping local vector ecology.
As earlier mentioned, seasonal patterns emerged clearly in the temporal analysis. At the hotspot counties, mosquito presence peaked between July and August, following an increase from May and declining sharply from October. This pattern reflects the combined influence of seasonal warmth—accelerating mosquito development [16]—and localized humidity [42]—extending survival and breeding potential. Together, these seasonal and microclimatic drivers contribute to explaining why hotspots of Anopheles presence is concentrated in specific sub-county zones despite relatively uniform summer warmth across the broader region.
Finally, our study highlights the novelty and importance of fine spatial-scale analysis. By working at the CBG level, we were able to identify micro-scale patterns—such as the link between cooler nighttime temperatures, higher humidity, and Anopheles presence —that would be obscured in coarser analyses. This granularity is particularly valuable for targeted surveillance and control strategies, allowing public health agencies to focus resources on the most at-risk neighborhoods (Figs. S1 and S2) rather than entire counties.
Limitations of our study include the reliance on GBIF presence-only occurrence data, which are subject to non-random sampling bias. Identified hotspots may partly reflect areas of high sampling effort rather than true mosquito abundance, while zero-record counties may simply be undersampled. The strong county-level correlations observed (Fig. 3) are also sensitive to spatial aggregation and outlier influence, whereas the finer CBG-level analysis captures more realistic ecological patterns of the species. The sharp post-2019 declines (Fig. 4) may reflect reduced surveillance or reporting during the COVID-19 pandemic, rather than ecological change. Additionally, although environmental predictors were derived from high-resolution datasets, they may not fully represent small-scale breeding habitats such as artificial containers, irrigation systems, or microclimatic variations. Finally, we acknowledge temporal mismatch among data sources (Table 1); our approach intentionally uses the most recent, high-quality covariate snapshots as time-stable environmental baselines for spatial modeling. Despite these limitations, the combined use of correlation analysis, SHAP-based model interpretation, and multi-scale spatial analysis offers a robust and interpretable framework for understanding Anopheles dynamics in a temperate setting.
Building on these limitations, several methodological refinements could strengthen future work. First, observer effort could be modeled more explicitly by incorporating effort proxies—such as observer-days, program-based versus opportunistic records, or structured surveillance subsamples—as covariates or offsets to better separate true ecological signal from sampling intensity. Second, although our use of long-term climatological normals provides a stable environmental baseline for defining typical conditions across Maryland, future analyses could complement this approach with temporally resolved covariates or decadal subsets to capture interannual variability. Third, at the county scale, multivariate frameworks (e.g., penalized regression or hierarchical models) could account for confounders such as population density or urbanization, surpassing the reported limitations of pairwise correlations in small samples. Finally, alternative distributional assumptions for count data—such as Poisson, negative binomial, or zero-inflated models—could be tested alongside gradient-boosted regression to benchmark predictive performance and assess residual spatial autocorrelation. Collectively, these enhancements would allow future versions of this framework to more fully disentangle ecological patterns from sampling biases and to advance hazard estimation for Anopheles mosquitoes in temperate environments.
Conclusions
This study combined long-term GBIF occurrence records with explainable machine learning and correlation analysis to examine the environmental and socioeconomic factors influencing the presence of Anopheles mosquitoes (An. punctipennis, An. crucians, and An. quadrimaculatus) across Maryland at both county and Census Block Group (CBG) scales. The analysis identified Prince George’s County as the primary hotspot of Anopheles presence in the state, supported by sufficient observer effort to allow robust spatial inference. Seasonal patterns further indicated that warm summer temperatures provide optimal climatic conditions for Anopheles proliferation in identified hazard hotspots. At the CBG scale, favorable breeding habitats—including woody wetlands, herbaceous wetlands, and open water—together with low minimum vapor pressure deficit emerged as the most influential environmental factors shaping mosquito suitability. These were followed by lower elevation, lower minimum temperatures, and higher precipitation, which collectively contribute to the persistence of humid, water-rich microenvironments conducive to mosquito development. Socioeconomic analysis using the Area Deprivation Index suggested that Anopheles mosquitoes may be more prevalent in CBGs with lower levels of social deprivation, contrasting with container-breeding mosquitoes such as Aedes aegypti, which are often associated with higher deprivation and urban decay. Overall, the findings demonstrate the value of integrating occurrence data with interpretable machine learning to resolve multi-scale patterns of Anopheles presence, providing a foundation for spatially targeted vector surveillance and public health preparedness. Future work will extend this framework by using the most informative feature combinations identified here to train predictive models capable of estimating Anopheles hazard across CBGs throughout the United States.
Supplementary Information
Supplementary file 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1CDC. Notes from the Field: Locally Acquired Mosquito-Transmitted (Autochthonous) Plasmodium falciparum Malaria — National Capital Region, Maryland, August 2023. https://www.cdc.gov/mmwr/volumes/72/wr/mm 7241 a 3.html. Accessed 9 Aug 2025.
- 2Maryland Department of Health. Maryland Department of Health announces positive case of locally acquired malaria. https://health.maryland.gov/newsroom/Pages/Maryland-Department-of-Health-announces-positive-case-of-locally-acquired-malaria.aspx. Accessed 10 July 2025.
- 3GBIF.org. GBIF Occurrence Download. 2025. https://gbif.org/
- 4Maryland Department of Agriculture. Mosquito Control. Retrieved August 11, 2025, from https://mda.maryland.gov/plants-pests/pages/mosquito_control.aspx. Accessed 10 July 2025.
- 5Kind AJH. Area Deprivation Index (ADI) (Version 4.0.1) . University of Wisconsin School of Medicine and Public Health Neighborhood Atlas. 2023. https://www.neighborhoodatlas.medicine.wisc.edu/download
- 6Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry/Geospatial Research, Analysis, and Services Program (CDC/ATSDR). (2022). CDC/ATSDR Social Vulnerability Index 2022 Database: United States. https://www.atsdr.cdc.gov/place-health/php/svi/svi-data-documentation-download.html
- 7PRISM Climate Group (2020) 30-year Normals: https://prism.oregonstate.edu/normals/
- 8Multi-Resolution Land Characteristics Consortium (MRLC 2024) NLCD: . https://www.mrlc.gov/data
