EyeInvaS: Lowering Barriers to Public Participation in Invasive Alien Species Monitoring Through Deep Learning
Hao Chen, Jiaogen Zhou, Wenbiao Wu, Changhui Xu, Yanzhu Ji

TL;DR
EyeInvaS is a deep learning system that allows citizens to monitor invasive species using mobile phone photos, improving public participation and accuracy in ecological protection.
Contribution
A novel AI-powered system for public participation in invasive species monitoring, validated through real-world deployment and benchmarked deep learning models.
Findings
EfficientNetV2 achieved 83.66% and 93.32% F1-scores on original and hybrid datasets, respectively.
Recognition accuracy was highest when targets occupied 60% of the frame against simple backgrounds.
EyeInvaS enabled mapping of Solidago canadensis in Huai’an, China, showing strong associations with riverbanks and roads.
Abstract
Invasive species pose serious threats to global biodiversity, agriculture, and ecosystems. Public participation offers an effective way to achieve large-scale and long-term monitoring, yet limited professional knowledge often reduces identification accuracy. This study introduces EyeInvaS, an intelligent image recognition system that enables citizens to identify and monitor invasive species simply by taking photos with their mobile phones. Using over ten thousand images—collected from online sources and synthetically generated under different scales and backgrounds—we built nine representative recognition models based on transfer learning and identified the optimal model and target scale through comparative analysis. The integrated EyeInvaS system supports key functions such as field reporting, rapid recognition, geographic tagging, and data sharing. Its reliability was validated…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31- —National Natural Science Foundation of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Environmental DNA in Biodiversity Studies · Wildlife-Road Interactions and Conservation
1. Introduction
The accelerating pace of globalization and increased cross-border transportation have significantly facilitated the unnatural spread of species across geographic boundaries [1]. In the absence of natural enemies or ecological constraints, alien species with high adaptability and reproductive capacity can readily establish invasive populations, thereby disrupting local ecosystems over the long term [2]. According to recent estimates, more than 37,000 invasive alien species have been identified globally, spanning a wide taxonomic range from vascular plants and vertebrates to insects, mollusks, and microorganisms, reflecting an increasingly multi-taxa and multi-niche invasion pattern [3].
The ecological, economic, and health-related risks posed by IAS have emerged as a global policy concern [4]. These species often outcompete natives for resources and niches, leading to ecosystem degradation and the collapse of native communities [5,6]. Economically, IAS result in billions of dollars in losses annually across agriculture, forestry, fisheries, and aquaculture sectors [7,8]. Certain species also act as carriers of novel pathogens, posing increased risks to public health [9]. Given these threats, early detection and rapid species identification are essential for reducing response costs and enabling targeted interventions.
Although traditional monitoring technologies are scientifically rigorous, they face critical limitations when applied at large spatial scales or in time-sensitive contexts. Environmental DNA (eDNA) methods are constrained by primer design and degradation rates [10,11]; remote sensing lacks the resolution needed for ground-level species detection [12]; and chemical baiting approaches are susceptible to background noise and temporal variability [13]. In response, public participation has become a valuable means to broaden the scope of invasive species monitoring. However, limited taxonomic knowledge among participants often restricts the effectiveness of such efforts, making it difficult to achieve reliable species identification [14]. Global citizen science platforms like iNaturalist [15] and EDDMaps [16], while advancing crowdsourced IAS monitoring, fail to meet China’s needs by lacking sufficient coverage of the country’s regional IAS and facing hours-long review delays from expert verification, which slows the critical early response to IAS.
Recent advancements in deep learning and computer vision offer new opportunities to enhance species identification by non-expert users. From convolutional neural networks (CNNs) to attention-based Transformers, these models have advanced in their ability to extract complex semantic information from images [17,18,19,20]. Lightweight architectures such as MobileNet and EfficientNet further enable deployment on mobile and edge devices, supporting real-time inference in field conditions [21,22,23]. These models have already shown efficacy in tasks such as species classification [24], ecological monitoring [25,26], and plant disease diagnosis [27,28].
Nevertheless, applying AI-based recognition to citizen science monitoring of IAS faces two major challenges: first, existing tools are often not user-friendly or accessible to the general public; second, field images frequently involve complex backgrounds and variable target scales, which degrade model robustness and accuracy.
To address these challenges, we propose EyeInvaS, a deep learning-powered intelligent recognition system designed to enhance public involvement in ecological surveillance. Our methodology focused on (1) establishing an image database covering high-concern invasive species in China; (2) comparative evaluation of nine recognition models; and (3) quantitative assessment of scale and background interference effects. The resulting framework synergizes public-submitted imagery with image acquisition, species recognition, geotagging, and data sharing functionalities, effectively bridging the gap between public participation and intelligent ecological monitoring. We validated this approach in a real-word Solidago canadensis surveillance initiative. This work provides a scalable and replicable tool for global IAS surveillance while advancing the integration of citizen participation into biodiversity governance.
2. Data and Methods
2.1. Dataset Construction and Preprocessing
2.1.1. Original Dataset
Based on the January 2023 edition of the List of Key Managed Invasive Alien Species in China, we selected 54 species across six major taxonomic groups—plants, insects, mollusks, fishes, amphibians, and reptiles—that are feasible for image acquisition. Microorganisms were excluded due to their invisibility to the naked eye and limited relevance for citizen detection tasks.
We developed a Python-based web crawler (version 3.12.3) to collect image samples of the target species from Baidu Images and Google Images, using their common names and scientific names. Low-quality and misidentified images were removed through manual screening, and taxonomy experts conducted a secondary review to ensure accuracy. The final dataset included 6109 images, each annotated with metadata including taxonomy, ecological traits, and geographic distribution. These data were also used to populate species profiles in the mobile application (see Table A1).
2.1.2. Multi-Scale and Multi-Background Synthetic Dataset
To assess model robustness under varying environmental conditions, we constructed a synthetic dataset simulating different scenarios and target scales. Specifically, we curated a library of scenario images representing 9 typical habitat types (e.g., riverbanks, forests, hillsides, farmland, and grasslands), standardized to a resolution of 224 × 224 pixels. For each of the 54 species, we prepared silhouette images of the target organism and resized them into 9 scale levels (from 25 × 25 to 200 × 200 pixels), simulating different observation distances.
Using image composition techniques, each target image was overlaid onto 9 background images without transparency to simulate real-world complexity. In total, 4374 synthetic samples were generated, each representing a unique combination of background and target scale. Figure 1 illustrates the synthetic image set for Solidago canadensis.
2.1.3. Data Augmentation
To improve model generalization and reduce overfitting, we applied geometric data augmentation techniques to the images. These included random rotations and horizontal and vertical flips, thereby increasing sample diversity and enhancing model robustness.
2.2. Model Development and Performance Evaluation
2.2.1. Model Selection and Training Strategy
We selected 9 representative deep learning architectures covering both convolutional and transformer-based paradigms, ranging from classic high-capacity models to efficient mobile-friendly networks. These include the early CNN AlexNet and deeper models such as VGG16 [29], ResNet50 [18], and DenseNet161 [30], which differ in depth, connection strategies, and feature reuse mechanisms. For mobile deployment, we evaluated lightweight architectures including MobileNetV2 [23], ShuffleNetV2 [21], and EfficientNetV2 [22]—designed for high efficiency with minimal performance trade-offs. Finally, we incorporated two transformer-based models, Vision Transformer (ViT) [19] and Swin Transformer (SwinT) [20], which model global and hierarchical attention mechanisms, respectively. This selection allows us to comprehensively benchmark performance across varying network designs and computational demands.
All models were initialized using ImageNet pre-trained weights and fine-tuned via transfer learning. Specifically, the feature extraction layers were frozen, and only the classification layers were retrained to adapt to the multi-class IAS recognition task. The dataset was split into training, validation, and testing subsets at a ratio of 8:1:1. Training was conducted over 100 epochs using the Adam optimizer with an initial learning rate of , a weight decay of and a batch size of 32, with categorical cross-entropy as the loss function. All experiments were performed on a device with an NVIDIA RTX 4090 GPU.
2.2.2. Evaluation Metrics
Model performance was comprehensively evaluated using four standard metrics: Accuracy, Precision, Recall, and F1-Score, defined as follows:
Here, TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. Among these metrics, the F1-score—balancing precision and recall—was used as the primary indicator for model comparison in this multi-class classification task.
2.3. EyeInvaS: An Intelligent Recognition System for IAS
To facilitate real-world deployment and enhance public participation in monitoring, we developed a mobile application system named EyeInvaS. The overall system architecture is shown in Figure 2 and comprises the following four layers:
- Data storage: Responsible for storing structured information, including invasive species images, taxonomic labels, and biological trait metadata.
- AI Service: Composed of general web services (based on SpringBoot) and the AI model service (built with PyTorch (2.5.1) and Flask (3.1.0)). These components communicate through RESTful APIs, ensuring modularity and extensibility.
- Functional modules: Integrates key functional modules such as image acquisition, IAS recognition, data sharing, location tagging, and knowledge diffusion. These modules were designed based on a user survey identifying public priorities in invasive species detection.
- User interaction: Represented by the EyeInvaS mobile application, which supports visual recognition of invasive species and serves as the main user interface. The app is built using the Jetpack MVVM architecture to improve code maintainability and device compatibility. It incorporates Mapbox for geolocation and spatial visualization.
Users can take photos or upload images from their gallery. The app then crops the image to an appropriate size (see Section 3.1.3), and calls the AI recognition service model to return a recognition result.
3. Results and Applications
3.1. Results
3.1.1. Model Performance Comparison
To evaluate model robustness under varying image conditions, we trained and evaluated nine models on both the original dataset and the hybrid dataset (original + synthetic). The training loss and validation loss curves of the 9 models on the hybrid dataset are shown in Figure 3a. Under the 100-epoch training strategy, the training loss of all models showed a downward trend, which indicates all models could effectively fit the training data. Except for VGG16 and ViT, the training loss of other models finally converged to below 0.1, and the gap between their training loss and validation loss was always less than 0.5, with no obvious overfitting. The train loss of VGG16 also decreased, but its final convergence value was higher than 0.1, resulting in a relatively weak convergence effect. The loss of ViT fluctuated significantly during the downward process. In contrast, SwinT, which is also a Transformer model, had a smoother loss curve and better performance.
Figure 3b presents the performance of each model in terms of accuracy, precision, recall, and F1-score. Overall, all models performed better on the hybrid dataset than on the original dataset, suggesting that synthetic augmentation effectively enhanced model generalization. CNN-based models demonstrated improved performance with increasing network depth. Lightweight networks such as MobileNetV2, ShuffleNetV2, and EfficientNetV2 achieved a good balance between computational efficiency and classification accuracy. Among the Transformer-based models, SwinT performed comparably to mainstream CNNs, while ViT showed suboptimal results—likely due to its reliance on large training datasets [20]. EfficientNetV2 achieved the highest F1-scores (83.66% and 93.32%) on both datasets and was chosen as the system backbone for its strong performance and mobile deployability.
Considering real-world monitoring scenarios, the public may encounter IAS from different taxonomic groups such as plants, insects, and amphibians. The overall excellent performance of the model on a single dataset does not fully indicate consistent recognition stability across various taxonomic groups—for instance, taxonomic groups with significantly distinct morphological features (e.g., amphibians and plants) may impose different requirements on the model’s feature extraction capability. To further verify the applicability of EfficientNetV2 in diverse taxa and clarify its performance differences and potential limitations among different taxa, we conducted an in-depth analysis of its cross-taxonomic recognition performance on the test dataset (see Table 1).
From the perspective of cross-taxonomic performance results, EfficientNetV2 also demonstrated strong classification capability. Amphibians, as the only taxonomic group containing a single species (Lithobates catesbeianus), had highly distinguishable morphological features (such as tympanic membranes and webbed feet) across different growth stages and shooting angles, with extremely high annotation consistency. Consequently, its Accuracy, Precision, Recall, and F1-score all reached 1.00. The F1-scores of fishes and reptiles were also outstanding, both standing at 0.98. Even though there were very few misclassifications, their unique morphological features (such as body shape and scale structure) still ensured near-perfect recognition performance. The F1-score of mollusks was 0.97, showing stable overall performance. The Accuracy of plants and insects was both 0.93, slightly lower than those of other groups. This difference mainly stems from the complexity of species within these two groups—the plant group includes 33 species (16 of which belong to the Asteraceae family), and the insect group includes 13 species. The morphological similarity among closely related species (e.g., the leaf morphology and inflorescence structure of Bidens pilosa and Chromolaena odorata) easily leads to intra-group misclassifications by the model.
3.1.2. Model Explainability and Prediction Visualization
Among the 1048 images in the test dataset from the hybrid dataset, EfficientNetV2 achieved an overall prediction accuracy of 94.36%, with only 60 misclassifications. Figure 4a shows examples of correctly identified species with high confidence scores, often approaching 100%. To further examine the model’s internal representations, we visualized the feature distribution using t-distributed stochastic neighbor embedding (t-SNE) on the hybrid dataset (Figure 4b). Features from the same species formed tight clusters, while those from different species were clearly separable, confirming the model’s strong class discriminability.
We also employed Grad-CAM and Guided Grad-CAM to visualize the attention regions in misclassified samples (see Table A2). Results indicated that errors were primarily caused by viewing angle bias, small target scale, or background interference. For instance, in the case of Ipomoea cairica, the model focused on the inflorescence while ignoring leaf characteristics, resulting in a misclassification as Phytolacca acinosa.
To systematically reveal the confusion relationships among all 54 IAS, we constructed a species-level confusion matrix (Figure 4c). This matrix presents the number of correct and incorrect predictions for each species, with the vertical axis representing true labels and the horizontal axis representing predicted labels. It can be clearly seen that misclassifications are mainly concentrated among congeneric or confamilial species—for example, Ambrosia trifida and Solidago canadensis, both belonging to Asteraceae; Cydia pomonella and Hyphantria cunea, both with similar characteristics of Lepidopteran insects—which is consistent with the conclusion of “misclassifications caused by morphological similarity” in the Grad-CAM analysis.
These insights informed the design of user guidelines: we recommend capturing complete, well-lit images that highlight key morphological features (e.g., leaves, stems, flower structure) and avoiding close-ups or backlit shots that may obscure relevant details.
3.1.3. Effects of Target Scale and Background Complexity
To evaluate the influence of target size and environmental background, we tested model performance of EfficientNetV2 across the 9 synthetic scales and background types. For each species, synthetic images were generated under nine different sizes and nine background scenarios. Figure 5 presents F1-score variations across these conditions.
Results show that target scale significantly affected recognition accuracy. When the object size was below 100 × 100 pixels, model performance dropped sharply. Accuracy stabilized and peaked when the target covered approximately 61% of the image area (around 175 × 175 pixels). Background complexity also played a role: recognition tended to be lower in cluttered environments (e.g., forest floors, agricultural fields), likely due to visual distraction from irrelevant textures.
These findings were incorporated into the EyeInvaS app by implementing a framing guide that encourages users to capture images in which the target occupies at least 60% of the frame and by avoiding complex or noisy backgrounds to improve recognition accuracy.
3.2. Application Scenarios
3.2.1. Functional Modules of the EyeInvaS App
Figure A1 presents the following main functional modules of the EyeInvaS app designed for citizen engagement:
- AI-based Image Recognition: Users can take photos or upload existing images for recognition. A built-in framing guide helps users compose images that meet the model’s optimal input conditions. The system returns the predicted species name and confidence score.
- Species Information: Users can access detailed information about the identified species, including taxonomy, ecological impact, geographic distribution, and recommended management strategies, enhancing public knowledge and awareness.
- Data Sharing: Users may add time and location metadata to their observations and upload them to the database, enabling both personal record-keeping and crowdsourced data aggregation.
- Geotagging: Integrated with Mapbox, this feature visualizes uploaded observations as geospatial points, making spatial patterns and invasion hotspots easily interpretable.
These features together form a closed-loop workflow from image acquisition to spatial visualization, empowering the public to participate meaningfully in IAS monitoring. A short demonstration video of the EyeInvaS app, highlighting core functions such as image capture, species recognition, data sharing, and spatial visualization, is available as Supplementary Video S1.
3.2.2. Case Study: Monitoring Solidago canadensis in Huai’an, China
To evaluate real-world usability, we conducted a pilot deployment of the EyeInvaS app in Huai’an, China. We collaborated with the local “SmartEye” environmental protection group to recruit volunteers. Participants were briefed on the use of the EyeInvaS app and instructed to document occurrences of Solidago canadensis by photographing plants and uploading records with location data.
A total of 1683 valid submissions were collected. All top-1 predictions had confidence levels exceeding 80% and were confirmed as accurate by expert reviewers. Based on geotagged records, we mapped the spatial distribution of S. canadensis (Figure 6), which revealed a concentration along riverbanks and transportation corridors—areas commonly associated with anthropogenic disturbance and propagule pressure.
This case study demonstrates the system’s effectiveness in enabling community-scale monitoring and provides empirical support for its practical deployment in urban and peri-urban ecosystems.
4. Discussion
The development of the EyeInvaS system demonstrates the potential of integrating deep learning and public participation to address the global challenge of invasive species monitoring. By combining a high-performance image recognition model with a user-friendly mobile interface, this study bridges the gap between technological innovation and citizen engagement. To further inform future applications and research, several key issues merit discussion.
4.1. Dataset Expansion and Model Generalization
Although our dataset included 54 invasive species across 6 taxonomic groups and incorporated synthetic augmentation to increase diversity, the current coverage remains limited. Microbial taxa were excluded, and image samples for amphibians and reptiles were relatively scarce, which may constrain model generalizability. Notably, factors such as background lighting, shadows and edge blending, which were not simulated in the synthetic data of this study, may also affect recognition efficiency, and future work will optimize these factors to enhance model generalization.
Future efforts should aim to expand taxonomic coverage, particularly for less observable groups, by integrating environmental metadata (e.g., habitat type, seasonality) alongside image data to enhance contextual inference. In addition, semi-supervised approaches such as pseudo-labeling or self-training can leverage unlabeled user-submitted data to address class imbalance and improve recognition of rare or long-tailed species 2023 [31,32].
4.2. Spatial Scaling via UAV Integration
Currently, EyeInvaS relies primarily on user-driven point data, which limits its coverage at regional scales. Particularly in scenarios such as lakes and marshes with complex terrain and limited accessibility, users struggle to conduct close-range observation and recording, directly resulting in sampling gaps in monitoring data.
A promising direction involves integrating public ground observations with UAV-based aerial monitoring [33]. This multi-source framework leverages the complementary strengths of crowd-sourced data and drone-enabled sensing to achieve scalable, high-resolution surveillance. Succeeding with this approach involves standardizing UAV-collected imaging and spectral data formats, aligning them with the system’s geotagging structure for spatiotemporal consistency, using the lightweight YOLOv8 detector to locate suspected IAS patches, then applying the core EfficientNetV2 model for fine-grained classification to balance speed and accuracy, and enabling real-time on-site data processing via embedded edge computing, with only high-value information transmitted back to reduce costs and delays. This integration enables rapid field deployment, expands IAS surveillance coverage, and enhances the EyeInvaS system’s performance in regional ecological monitoring [34,35].
4.3. Policy Interfaces and Institutional Integration
The long-term effectiveness of citizen science depends on its integration into formal ecological governance frameworks. The case study in Huai’an illustrates how citizen-contributed data can reveal spatial correlations between invasive spread and anthropogenic corridors, providing micro-level evidence to support policy intervention.
We recommend linking EyeInvaS with national and local IAS databases through standardized data protocols and review mechanisms. Among these, the standardized protocols will unify core data fields by aligning IAS taxonomic labels with official nomenclature, setting valid thresholds for AI recognition confidence, and ensuring geographic coordinates meet official spatial precision standards; the review mechanism can adopt a two-stage model, where AI first conducts preliminary screening to filter out invalid data such as submissions with missing location information or blurred images, and experts then verify data accuracy—with extra attention to species that share similar morphological features.
Inspired by global platforms such as iNaturalist and EDDMapS, incentive systems including contributor badges or leaderboards could improve user retention and data submission consistency. Such mechanisms, through clear forms of recognition, allow users to perceive the practical value of their participation in ecological monitoring, thereby enabling them to more proactively maintain their participation frequency; specific forms may include contributor badges for first observations or rare species discoveries, as well as participation leaderboards that showcase regional contribution levels.
5. Conclusions
This study introduces EyeInvaS, a deep learning-powered intelligent recognition system that enables convenient identification of invasive species. By leveraging neural networks and mobile technologies, we enhance the ability of non-specialist users to accurately identify invasive species.
We constructed a novel image dataset covering 54 invasive species of management priority in China and systematically evaluated nine mainstream deep learning models. EfficientNetV2 was identified as the optimal backbone. Through controlled experiments on object scale and background complexity, we revealed key factors affecting model performance and informed practical image acquisition strategies. These findings were embedded into the app’s framing guide for improved user input. The EyeInvaS system integrates image acquisition, species recognition, geotagging, and data sharing in a closed-loop workflow and demonstrated real-world efficacy in a field case study in Huai’an, China.
Future work will focus on dataset expansion and cross-platform integration, as well as institutional adoption pathways. This study contributes a scalable, replicable framework for real-time, public-powered surveillance of invasive species and offers a practical tool for biodiversity conservation and biosecurity.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Hulme P.E. Trade, transport and trouble: Managing invasive species pathways in an era of globalization J. Appl. Ecol.200946101810.1111/j.1365-2664.2008.01600.x · doi ↗
- 2Pyšek P. Hulme P.E. Simberloff D. Bacher S. Blackburn T.M. Carlton J.T. Dawson W. Essl F. Foxcroft L.C. Genovesi P. Scientists’ warning on invasive alien species Biol. Rev.2020951511153410.1111/brv.1262732588508 PMC 7687187 · doi ↗ · pubmed ↗
- 3IPBES Thematic Assessment Report on Invasive Alien Species and Their Control 4th ed.IPBES Secretariat Bonn, Germany 2023
- 4Venette R.C. Hutchison W.D. Invasive Insect Species: Global Challenges, Strategies & Opportunities Front. Insect Sci.2021165052010.3389/finsc.2021.65052038468878 PMC 10926476 · doi ↗ · pubmed ↗
- 5Bellard C. Cassey P. Blackburn T.M. Alien species as a driver of recent extinctions Biol. Lett.2016122015062310.1098/rsbl.2015.062326888913 PMC 4780541 · doi ↗ · pubmed ↗
- 6Valiente-Banuet A. Aizen M.A. Alcántara J.M. Arroyo J. Cocucci A. Galetti M. García M.B. García D. Gómez J.M. Jordano P. Beyond species loss: The extinction of ecological interactions in a changing world Funct. Ecol.20152929930710.1111/1365-2435.12356 · doi ↗
- 7Diagne C. Leroy B. Vaissière A.-C. Gozlan R.E. Roiz D. JarićI. Salles J.-M. Bradshaw C.J. Courchamp F. High and rising economic costs of biological invasions worldwide Nature 202159257157610.1038/s 41586-021-03405-633790468 · doi ↗ · pubmed ↗
- 8Turbelin A.J. Cuthbert R.N. Essl F. Haubrock P.J. Ricciardi A. Courchamp F. Biological invasions are as costly as natural hazards Perspect. Ecol. Conser.20232114315010.1016/j.pecon.2023.03.002 · doi ↗
