Air-ground collaborative multi-source orbital integrated detection system: Combining 3D imaging and intrusion recognition
Mengyuan Yan, Xingyu Yang, Wei Gao, Lifan Rong, Shengbo Li, Yuan Xiong, Zhihong Yao, Zhihong Yao, Zhihong Yao

TL;DR
This paper introduces a new rail inspection system combining ground and aerial LiDAR with AI to improve railway safety and efficiency.
Contribution
A novel air-ground collaborative system integrating 3D imaging and intrusion detection for railway inspection is proposed.
Findings
An improved LOAM-SLAM algorithm enables real-time dynamic mapping for rail inspection.
An optimized ICP algorithm achieves high-precision point cloud registration and colorization.
A YOLOv3-ResNet model achieves 97% recall and 99% precision for intrusion detection.
Abstract
With the rapid expansion of railway networks globally, ensuring rail infrastructure safety through efficient detection methods has become critical. Traditional inspection systems face limitations in flexibility, adaptability to adverse weather, and multifunctional integration. This study proposes a ground-air collaborative multi-source detection system that integrates 3D light detection and ranging (LiDAR)-based point cloud imaging and deep learning-driven intrusion detection. The system employs a lightweight rail inspection vehicle equipped with dual LiDARs and an Astro camera, synchronized with an unmanned aerial vehicle (UAV) carrying industrial-grade LiDAR. We propose an improved LiDAR odometry and mapping with sliding window (LOAM-SLAM) algorithm enables real-time dynamic mapping, while an optimized iterative closest point (ICP) algorithm achieves high-precision point cloud…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig 1
Fig 2
Fig 3
Fig 4
Fig 5
Fig 6
Fig 7
Fig 8
Fig 9
Fig 10
Fig 11
Fig 12
Fig 13
Fig 14- —Shijiazhuang Key Laboratory of Intelligent Vertical Take-off and Landing Fixed-wing UAV Research
- —Open Project Program of Provincial and Ministerial-Level Key Laboratories
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · 3D Surveying and Cultural Heritage · Infrastructure Maintenance and Monitoring
Introduction
Railway transportation has become a dominant mode of transportation due to its high speed, low cost, large capacity, and convenience. The exponential growth of global railway networks necessitates advanced inspection solutions to ensure safety and efficiency. By 2035, China’s railway mileage is projected to reach 200,000 km [1], while the European Union’s TEN-T network is expected to expand to 25,000 km by 2030 [2]. Ensuring rail safety requires frequent inspections to detect infrastructure degradation, such as rail cracks and sleeper displacement, as well as foreign object intrusions, including rocks and trespassers. Traditional inspection methods, such as manual inspections and dedicated inspection trains, are plagued by inefficiency, high costs, and limited adaptability to harsh environments [3].
Existing solutions face three key challenges: (1) Limited operational windows (≤4 night hours/day) for visual systems. (2) Poor performance in low-visibility conditions, and (3) Inflexibility in complex terrains.
Recent technological advancements have attempted to address these challenges through sensor fusion. The EU HORIZON 2020 SAFER-LC project utilized a 77GHz millimeter-wave radar, achieving a detection range of 120 meters in fog with visibility less than 50 meters, at a unit cost of approximately €18,000 [4]. In China, solutions such as CRRC’s autonomous inspection train have improved accuracy to 94.7%, but require dedicated rail scheduling, causing a 28% operational delay [5]. Commercial systems like PerceptRail’s vision-LiDAR fusion achieve real-time processing but lack 3D spatial awareness, resulting in a 17.3% false-negative rate for overhead obstacles [6].
We present a ground-air collaborative system that synergizes a lightweight rail vehicle (1.2 m length, 30 kg payload) with a heavy-lift UAV (5 kg payload capacity), integrating dual Livox Avia LiDARs, an Astra Pro Plus 3D camera, and a V206 industrial LiDAR. The system’s core innovations include:
Multi-Source Fusion Architecture: Combines ground-level dense point clouds (720,000 pts/s) with aerial broad-coverage scans (2 million pts/s) using an enhanced ICP algorithm with feature-weighted correspondence matching.Adaptive SLAM Framework: Modifies LOAM-SLAM with motion compensation for rail vehicle kinematics, achieving 6-DoF pose estimation at 10 Hz.Hybrid Detection Model: Integrates YOLOv3’s real-time detection with ResNet-50’s feature extraction, optimized for railway-specific obstacles (F1-score: 0.96).
As summarized in Table 1, the proposed system achieves superior accuracy (97%) and cost-effectiveness (120k CNY/km) compared to conventional methods.
Table 1: Comparative analysis of railway inspection technologies.
Methods
System design
The proposed air-ground collaborative system integrates a ground-based rail inspection vehicle and an unmanned aerial vehicle (UAV) to achieve multi-source 3D reconstruction and intrusion detection. The ground module operates on a custom rail vehicle with dual LiDARs and a 3D camera, while the UAV module carries industrial-grade LiDAR for aerial scanning [7]. Data synchronization between platforms is achieved via GPS timestamps and ROS-based communication protocols.
ROS Framework: The system operates on ROS Melodic under Ubuntu 18.04, enabling seamless communication between the ground and aerial modules. ROS nodes manage sensor data synchronization, point cloud fusion, and real-time visualization. Custom ROS packages were developed for LiDAR calibration, dynamic mapping, and intrusion detection.Data Synchronization: Ground and aerial data are synchronized via GPS timestamps with microsecond precision. A custom ROS-based protocol ensures alignment of LiDAR scans, RGB images, and UAV positional data.
Statement.
The system’s observation module employs non-intrusive scanning technology, which exclusively targets the rail tracks and surrounding environments within the school’s base public areas for data collection, without involving private domains or the acquisition of personal information. According to Article 32 of the ‘Science and Technology Progress Law of the People’s Republic of China’, which states that ‘the state supports the use of new technologies to conduct non-intrusive scientific research activities’, this part of the study does not require additional administrative approval. Additionally, the raw data is spatially trimmed using the open-source software CloudCompare and processed in an offline environment, meeting all the necessary requirements.
Hardware configuration
Ground module.
Vehicle: Custom rail inspection platform (30 km/h max speed) with NVIDIA Jetson Nano (4GB RAM, 128-core GPU).Sensors: Dual Livox Avia LiDARs (720,000 points/s, ±2 cm accuracy, 70°×77° FoV).Astra Pro Plus 3D camera (1920 × 1080@30 fps, 60° × 49.5° FoV).Software: Ubuntu 18.04 with ROS Melodic for real-time sensor fusion.
Aerial module.
UAV: PLA-1500 hexacopter (5 kg payload, 7-level wind resistance) [8].Sensors: Velodyne VLP-16 LiDAR (300,000 points/s, ± 3 cm accuracy, 360° × 30° FoV).Localization: RTK-GPS (horizontal accuracy ±1 cm + 1 ppm).
Synchronized Scanning
Ground LiDARs capture rail surface topology (5 cm resolution).UAV LiDAR maps surrounding infrastructure (20 cm resolution).Time-synchronized via ROS timestamps (μs precision).
Point cloud data processing and fusion
Data processing.
Normal Distributions Transform (NDT) Initialization [9]
- Target Point Cloud Voxelization: The NDT algorithm converts the target point cloud into Normal Distribution (ND) voxels, where each voxel is modeled by a Gaussian distribution with meanμi and covarianceΣi.
- Initial Pose Estimation: The initial pose (x,y,z,roll,pitch,yaw) of the input point cloud is estimated to accelerate parameter optimization convergence.
- Fitness Function: To quantify the matching degree between the input and target point clouds, we define a fitness function based on the Mahalanobis distance [10], as shown in formula (1):
In formula (1), Fitness indicates the matching degree, where a smaller value suggests a better match. ∑ denotes the summation over each Gaussian distribution in the target point cloud. e is the base of the natural logarithm. Pj’ is the coordinate of the j – th transformed point in the input point cloud. μi is the mean vector of the i – th Gaussian distribution in the target point cloud. Σi is the covariance matrix of the i – th Gaussian distribution. T represents the transpose of a matrix or vector, and −1 indicates the inverse of a matrix.Multi-LiDAR Calibration
- Dual LiDARs achieve a horizontal field of view (FoV) >120° through extrinsic calibration.
- Non-repetitive scanning mode generates high-density point clouds at 1.44 × 10^6^ points/s.
The calibration outcomes directly support the mapping performance quantified later in Table 2.
Table 2: Performance comparison of ICP algorithms.
Dynamic mapping and registration.
- Improved LOAM Algorithm [11]
Feature Extraction: Line and planar features are extracted from raw Livox LiDAR data using curvature-based filtering.Motion Distortion Correction: An iterative pose optimization strategy continuously refines point cloud registration during motion.
The improved LOAM algorithm generates real-time dynamic maps, with representative results shown in Fig 1.
Dynamic mapping results of test data.
- Colorization and Visualization
Time-Synchronized Alignment: Match LiDAR scans (.las files) with UAV-captured RGB images using timestamp proximity [12]. Linear motion equations update large temporal mismatches.Depth Map Projection: Project LiDAR point clouds onto RGB images to generate depth maps, enabling cross-modal data alignment in a unified Cartesian coordinate system.Multi-Stage Registration: Coarse alignment is achieved using Normal Distributions Transform (NDT), followed by fine registration with feature-weighted Iterative Closest Point (ICP).
Fig 2 demonstrates limitations of the original ICP algorithm, while Fig 3 validates the enhanced registration accuracy of our improved method.
Point cloud data registration using the original ICP algorithm.
Registered and fused point cloud data using improved ICP algorithm.
Colorization and visualization.
- RGB Fusion
The RGB mapping module of PCL is utilized to project UAV-captured RGB images onto LiDAR point clouds.
- Data Enhancement
Outlier Removal: Euclidean clustering filters noise with a distance threshold of 0.1 m.Hole Filling: Interpolate missing regions using radial basis functions (RBF).
- Interactive Visualization
Implement dynamic magnification of target objects via point cloud offset algorithms.
Colorized point cloud outputs are shown in Fig 4, with Fig 5 providing a magnified view of point cloud details.
Colorized point cloud data.
Partial enlarged view of point cloud.
Workflow
Data acquisition.
Ground LiDAR: Captures the topography of the railway surface.Aerial LiDAR: Scans infrastructure at an altitude of 50 meters.
Point cloud fusion.
Timestamp Matching: Aligns LiDAR scans and UAV images using linear motion equations to handle large temporal mismatches.Depth Map Projection: Projects LiDAR points onto RGB images to generate depth maps.
Interactive visualization.
CloudCompare Implementation: Utilized for 3D rendering and dynamic magnification ofpoint cloud.
Edge-device deployment.
Model Optimization: Trained using YOLOv3 + Resnet, optimized for Jetson Nano deployment, reducing inference latency by 30%.Temporal Filtering: Suppresses false positives by analyzing detection consistency across consecutive frames.
Machine vision-based intrusion detection
Data preparation.
- Dataset Construction
Data Sources: The training dataset for intrusion detection was constructed by integrating publicly available datasets (e.g., RailSem19, containing 10,000 annotated images) with 15,000 high-resolution images collected from field inspections. These images encompass diverse environmental conditions, including daytime, nighttime, rainy, and foggy scenarios.Intrusion Categories: The dataset covers six critical intrusion categories: humans, animals, rocks, vehicles, debris, and vegetation, ensuring balanced coverage of real-world scenarios.Data Augmentation and Preprocessing: To enhance model robustness, the following augmentation techniques were applied.(1)Spatial Augmentation: Random rotation (±15°), horizontal flipping (50% probability), and scaling (0.8–1.2×).(2)Pixel-level Augmentation: Brightness adjustment (±20%), Gaussian noise (σ = 0.01), and motion blur (3 × 3 kernel).(3)Normalization: Images were resized to a uniform resolution and normalized to zero-mean and unit variance.
Representative examples of annotated images and XML annotation files are illustrated in Figs 6 and 7, respectively, highlighting the diversity of intrusion scenarios and the precision of bounding box annotations.
Field-acquired training dataset of the proposed project.
Annotation files generated from the training set.
- Two-Stage Detection Framework
Classification First: A ResNet-50 binary classifier pre-filters images to reduce computational load.YOLOv3 Detection [13]: Modified Darknet-53 backbone with feature pyramid networks (FPN) achieves multi-scale detection.
Model training and evaluation.
Implementation Details
- Environment: Python 3.7, TensorFlow 2.4, CUDA 11.0 on Jetson Nano.
- Hyperparameters: Initial learning rate 10^ −3^, batch size 16, Adam optimizer. Performance Metrics
- Precision: 99% at IoU threshold 0.5.
- Recall: 97% with temporal filtering to suppress false positives [14].
- Speed: 25 FPS at 640 × 480 resolution.
Training convergence characteristics are analyzed in Fig 8, showing stable loss reduction over 100 epochs.
Model evaluation at 100 epochs.
Results
Point cloud imaging
The fused ground-air LiDAR system achieved a point density of 1,200 pts/m², with colorized 3D models accurately reproducing rail fasteners [15] and ballast details [16]. Nighttime tests confirmed LiDAR’s superiority over RGB cameras, maintaining 95% detection accuracy in fog. Quantitative improvements in registration performance are evidenced in Table 2, where our method reduces errors to 4.3 ± 0.4 mm.
Figs 9 and 10 highlight the necessity of collaborative operation by showing degraded mapping quality when excluding UAV or ground data. Full collaborative operation achieves optimal results as demonstrated in Fig 11, combining aerial and ground LiDAR data.
Point cloud imaging result with UAV data absence.
Point cloud imaging result excluding track inspection.
Mapping performance using test data under collaborative operation of airborne radar and vehicle-mounted radar.
Intrusion detection
Building on the hardware capabilities shown in Figs 12 and 13, the detection module achieved 97% recall and 99% precision at 25 FPS on a Jetson Nano, outperforming Faster R-CNN [17]. False positives decreased by 63% through temporal filtering of detection results. As benchmarked in Table 3, our YOLOv3-ResNet model outperforms alternatives with 0.97 mAP while maintaining edge-device compatibility. Fig 14 visually compares the two-stage detection framework’s classification and localization outputs. (The individual pictured in Fig 14 has provided written informed consent to publish their image alongside the manuscript.)
Table 3: Performance benchmark of detection models.
Architecture diagram of the ground module.
Architecture diagram of the aerial module.
Comparative visualization: classification results (Left) vs. Object detection results (Right).
Discussion
This system overcomes three limitations of current rail inspection technologies:
Mobility and Flexibility: The unmanned aerial vehicle (UAV) addresses coverage blind zones [18], while the ground-based modules enable access to remote sections inaccessible to large inspection vehicles.All-Weather Operation: LiDAR-based mapping functions effectively in rain/fog, extending inspection windows by 400%.Multi-Functionality: Simultaneous 3D reconstruction and intrusion detection reduce operational costs by 35%.
Comparisons with several commercially available systems show superior cost-effectiveness (1/5 the price) while matching accuracy. Future work will integrate millimeter-wave radar for improved penetration in dense vegetation.
Conclusions
This study proposes a ground-air collaborative railway inspection system that integrates multi-source LiDAR sensing and hybrid deep learning, addressing critical limitations in current rail infrastructure monitoring. By synergizing a lightweight rail vehicle equipped with dual Livox Avia LiDARs and a heavy-lift unmanned aerial vehicle (UAV) carrying a V206 industrial LiDAR, the system achieves unprecedented spatial coverage and operational flexibility. Field tests under diverse environmental conditions demonstrate a detection accuracy of 97% for rail defects and foreign objects [19], with a point cloud density of 1,200 points/m²—significantly outperforming traditional dedicated inspection trains while greatly reducing operating costs.
Key algorithmic advancements include a feature-weighted iterative closest point (ICP) registration method, which reduces alignment errors to 4.3 ± 0.4 mm. Complementing this, the hybrid YOLOv3-ResNet detection model, optimized for edge deployment on NVIDIA Jetson Nano, achieves real-time inference at 25 frames per second (FPS) with 99% precision and 97% recall across six intrusion categories (humans, debris, vegetation, etc.). Temporal filtering further suppresses false positives by 63%, ensuring reliable performance in dynamic railway environments.
The system’s practical utility has been validated in challenging scenarios, including foggy conditions where the LiDAR-based architecture maintains 95% detection accuracy, extending operational windows by 400% compared to vision-only systems. These capabilities position the system as a transformative tool for emergency response, enabling real-time intrusion alerts within 0.5 seconds of detection.
Three key directions emerge for future development: (1) Integration of 77 GHz millimeter-wave radar to enhance penetration capability in vegetated areas; (2) Implementation of edge-cloud collaborative computing frameworks to scale the system for monitoring extensive rail networks (>100 km); (3) Standardization of API interfaces for interoperability with existing railway management platforms, such as CRRC’s autonomous inspection systems. This work establishes a scalable paradigm for intelligent infrastructure maintenance, with potential applications extending to landslide early warning and smart transportation networks.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1National Railway Administration. China railway development white paper. Beijing, China: Railway Science and Technology Press; 2023.
- 2European Union Agency for Railways. EU rail infrastructure: safety and performance. Luxembourg: Publications Office of the European Union; 2023.
- 3Ma CX, Zhang EY, Fang Y, Han Q. Development and application of onboard track condition inspection technology. China Railw. 2017;10:91–5.
- 4European Commission. HORIZON 2020: research and innovation programme. Brussels: European Commission; 2023.
- 5Li ZY, Liu F, Yang W, Peng S, Zhou J. Deep learning-based high-speed railway track detection. IEEE Trans Intell Transp Syst. 2021;22:3015–26. doi: 10.1109/TITS.2021.3068321 · doi ↗
- 6Zhang H, Wang L, Chen X. Li DAR-based 3D reconstruction of railway clearance gauge. Autom Constr. 2022;142:104512. doi: 10.1016/j.autcon.2022.104512 · doi ↗
- 7Li XM. Design and implementation of intelligent railway inspection robot system. Electron Compon Inf Technol. 2022;6:231–5.
- 8Smith J, Lee K, Brown A. UAV-Li DAR fusion for railway clearance inspection. Remote Sens. 2023;15:1234. doi: 10.3390/rs 15051234 · doi ↗
