LAT-BirdDrone: A dedicated dataset for high-precision classification of low-altitude small target trajectories enhanced by hybrid neural networks
Ming Ke, Xin Kang, Zixuan Zhang, Lubin Wang, Gang Wang

TL;DR
The paper introduces LAT-BirdDrone, a new dataset for classifying trajectories of small flying targets like birds and drones, using advanced detection systems and hybrid neural networks.
Contribution
The novel contribution is the creation of a dedicated dataset for low-altitude micro-target trajectory classification, addressing a gap in existing datasets.
Findings
LAT-BirdDrone contains 33,262 image samples and 665 trajectory data entries covering birds and drones.
The dataset supports evaluation of CNN, Transformer, and hybrid models for trajectory classification.
It provides resources for optimizing optoelectronic sensors in low-altitude monitoring.
Abstract
Currently, most existing datasets predominantly focus on object classification based on appearance, while datasets specifically designed for trajectory classification of low-altitude flying micro-targets (11pt × 11pt or smaller) remain scarce. There is also an urgent need for a dedicated dataset to provide comprehensive data support for verifying the impact of the CNN module on trajectory classification models. To address these gaps, this research utilized the Three-band refrigeration photoelectric turntable Z50IV-CTVC690110–21,100—an advanced long-distance optoelectronic detection system independently developed by Hepu Vision—to collect data. The data collection covered multiple videos, including visible light and infrared footage of birds and drones. For data processing, the YOLOv11n algorithm was applied for object detection and trajectory extraction, and the ByteTrack algorithm was…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUAV Applications and Optimization · Robotics and Sensor-Based Localization · Video Surveillance and Tracking Methods
Specifications TableSubjectComputer SciencesSpecific subject areaLow-altitude micro-target (≤11pt×11pt) trajectory classification; scarce datasets hinder research. This study builds LAT-BirdDrone via Hepu Vision's turntable, evaluates CNN/Transformer/LSTM, proposes multi-modal hybrid models, aiding low-altitude security & optoelectronic sensor apps.Type of dataVideo,LabelData collectionData were collected using the Three-band refrigeration photoelectric turntable Z50IV-CTVC690110–21,100 (integrates HD visible-light camera, cooled thermal imaging, optional laser ranging; designed for all-weather target detection/positioning/tracking/recognition), recording visible-light/infrared videos of 4 drones (DJI Air, DJI Inspire, Mavic Pro, DJI Mini) and 4 birds (crow, sparrow, goose, magpie) at 1920×1080/640×512 resolution, 25fps. 65 videos (15 drone-only, 50 bird-only) were selected from raw data. Inclusion criteria: stable camera, few flying targets, sparse bird distribution. Exclusion criteria: bird videos <3 s, blurry footage from lens transitions. Targets were manually annotated (position, size, category, completeness, background clarity); annotations were verified by merging with original videos.Data source locationBeijing,ChinaData accessibilityRepository name: LAT-BirdDroneDirect URL to data: (https://www.scidb.cn/anonymous/ZkFWRlJ2)Low-altitude flying small targets including datasets of birds and dronesRelated research articleNone
Value of the Data
1
- •Addressing the inadequacy of existing datasets, which generally lack trajectory-level semantic annotations and kinematic parameter records, the LAT-BirdDrone dataset not only generates 33,262 image samples through object detection but also improves the accuracy of trajectory extraction using the ByteTrack algorithm, producing 665 pieces of detailed trajectory data. It covers key dynamic features of targets during motion, providing core data support for trajectory classification research on low-altitude micro-targets (11pt × 11pt or smaller).
- •In response to the issue that most datasets only contain RGB single-modal data, which limits the development of multimodal fusion algorithms, LAT-BirdDrone integrates dual-modal videos of visible light (4 K 60fps) and infrared (640×512 25fps). This effectively alleviates the interference of lighting changes and complex backgrounds on target recognition, meeting the training and validation needs of multimodal fusion models.
- •To overcome the shortage of existing datasets in covering complex scenarios (e.g., adverse weather, special lighting conditions), LAT-BirdDrone includes 6 types of backgrounds such as urban areas, farmland, and lakes, along with data collected under special lighting (e.g., dusk and dawn) and adverse weather (e.g., rain, snow, and sandstorms). It can more realistically simulate actual low-altitude monitoring environments and enhance the generalization ability of models in complex scenarios.
- •Considering the insufficient continuity and accuracy in annotating low-altitude micro-targets in existing datasets, LAT-BirdDrone optimizes the anchor box strategy when using YOLOv11n for object detection, focusing on ensuring the continuity and consistency of annotations for micro-targets smaller than 10pt. It addresses common issues such as “trajectory breaks” and “mislabeling” in micro-target tracking, providing high-quality annotated data for micro-target trajectory analysis.
- •Aiming at the limitation that existing datasets hardly support the performance evaluation of composite neural networks (e.g., spatiotemporal fusion models), LAT-BirdDrone can systematically support the performance testing of basic models (such as CNN, Transformer, and LSTM) and multimodal hybrid architectures (e.g., Transformer+LSTM, CNN+BiLSTM). It provides a standardized data benchmark for model innovation and optimization in the field of low-altitude micro-target trajectory classification.
Background
2
Existing datasets mostly focus on appearance-based object classification, while those dedicated to trajectory classification of low-altitude micro-targets (≤11pt×11pt) are scarce—hindering the development of models for trajectory-based analysis of such targets. Meanwhile, a specialized dataset is needed to verify the CNN module’s impact on trajectory classification models. This dataset used Hepu Vision’s three-band refrigeration photoelectric turntable Z50IV-CTVC690110–21,100 to collect visible light and infrared videos of birds and drones, with YOLOv11n for object detection and ByteTrack to improve trajectory extraction accuracy. It adds value to the original research by providing detailed dataset background, laying a foundation for low-altitude micro-target research in low-altitude security.
Data Description
3
We selected four drones—DJI Air, DJI Inspire, Mavic Pro, and DJI Mini(as shown in Figs. 1a–d) —primarily due to their widespread use and portability in practical applications.Fig. 1. Unmanned aerial vehicles (UAVs) featured in LAT-BirdDrone.Fig 1
Furthermore, four bird species—crow, sparrow, goose, and magpie (as shown in Figs. 2a–d)—were selected for this study. These particular species were chosen due to their size and coloration, which bear a striking resemblance to the DJI Mavic and DJI Mini drones. The crow, with its dark plumage and medium size, mirrors the silhouette and color of these drones, making it an excellent candidate for comparison. Similarly, the sparrow, though smaller, provides a useful contrast in terms of its compact structure and more varied coloration patterns. The goose, larger and more robust, offers a perspective on how larger avian shapes might be perceived in relation to drone size.Lastly, the magpie, with its distinct black and white markings, closely resembles the visual complexity of the drones, providing a useful analog for studying visual recognition and differentiation. This selection was made to ensure that our observations and findings were grounded in realistic comparisons, thereby enhancing the validity and applicability of our research.Fig. 2. Avian species featured in LAT-BirdDrone.Fig 2
To simulate the challenges of aerial target classification in real-world environments, we recorded visible-light bird flight videos across a variety of backgrounds, including residential areas, parks and scenic spots, urban lakes, and reservoir shorelines (Fig. 3). These background settings were chosen for their representativeness, complexity, and potential sources of interference.Fig. 3. Low-altitude unmanned aerial vehicles captured using infrared imaging.Fig 3
In addition, for infrared bird flight recordings, we specifically captured extensive footage of geese in flight over reservoirs and riverbanks (Fig. 4). We also recorded videos of crows in flight at locations where they commonly gather, encompassing various flight states and diverse backgrounds.Fig. 4. Low-altitude birds captured using infrared imaging.Fig 4
Finally, we recorded drone flight videos across a range of environments, including construction sites, suburban areas, open plains, and mountainous regions (Fig. 5).Fig. 5. Low-altitude unmanned aerial vehicles captured using infrared imaging.Fig 5
From the raw video dataset, we carefully selected a total of 65 videos, comprising 15 videos containing only drones and 50 videos containing only birds.
To capture these videos, we employed the Three-band refrigeration photoelectric turntable Z50IV-CTVC690110–21,100 (Fig. 6), a multi-functional optoelectronic sensor integrating high-definition visible-light cameras, cooled thermal imaging, and optional laser ranging modules. Designed for all-weather, full-time target detection, positioning, tracking, recognition, and tracing, this turntable is ideally suited for three-dimensional security protection in critical sites like airports, ports, and nuclear/biochemical facilities. Leveraging its multi-spectral capabilities, we recorded visible-light and infrared videos of low-altitude bird and drone flights, forming the core data source for our study. These videos were captured at resolutions of 1920×1080 and 640×512, with a frame rate of 25 fps. Before annotating the data, we selected videos where the camera remained stable, the number of flying targets was minimal, and the bird distribution was relatively sparse. Additionally, we removed bird flight video data with a duration of <3 s, as well as blurry footage caused by lens transitions, in order to minimize external interference and reduce the risk of incorrect annotations. We manually annotated the attributes of each flying target in the videos, including position, size, category (drone or bird), completeness, and background clarity. During the annotation validation process, we merged the annotation results with the original videos and carefully verified the accuracy of the annotations.Fig. 6. Three-band refrigeration photoelectric turntable Z50IV-CTVC690110–21,100.Fig 6
These selections constitute a dataset containing 33,262 frames, with a total duration of 23.72 min and a time resolution of 25 fps. Considering that flying targets may appear at different times in the video and that operational factors may disrupt the continuity of their trajectories, we decided against standardizing the video duration. Among these videos, 19,505 frames were manually annotated as bird images, and 13,757 frames were manually annotated as drone images. Regarding the LAT-BirdDrone dataset, we have generated the following images (Fig. 7, Fig. 8, Fig. 9):Fig. 7. Pie chart of category distributionFig 7Fig. 8Scatter plot of bounding box dimensions.Fig 8. Fig. 9Histogram of aspect ratio distribution for drones and birds.Fig 9
We obtained a mixed dataset of 665 trajectories, comprising both bird and drone tracks, to test the classification accuracy of various models using the YOLOv11n detection and tracking algorithm. The bounding box size distribution is detailed in Fig. 10, while the trajectory length distribution is illustrated in Fig. 11. The trajectory statistics details table is shown in Table 1.Fig. 10. Bounding box size distributionFig 10Fig. 11Distribution of trajectory length.Fig 11. Table 1Detailed trajectory statistics of the LAT-BirdDrone Dataset.Table 1. AttributeSubcategoryTrajectory CountAverage Frame LengthMin Frame LengthMax Frame LengthModal CoverageTarget TypeBird3504910128Visible+InfraredDrone3154710112Visible+InfraredBird SpeciesCrow925112128Visible+InfraredSparrow88461095Visible+InfraredGoose855315118Visible+InfraredMagpie854710102Visible+InfraredDrone ModelDJI Air82451098Visible+InfraredDJI Inspire794912112Visible+InfraredMavic ProDJI Mini77774846101010596Visible+InfraredVisible+InfraredScenarioUrban Area1424810115Visible + InfraredFarmland1384710108Visible + InfraredLake/Reservoir1355012128Visible + InfraredConstruction Site85451092Visible + InfraredSuburb83461095Visible + InfraredMountainous Region824710102Visible + InfraredWeather/ LightingClear Daytime3204810120Visible + InfraredDusk/Dawn1854710118Visible + InfraredRain/Snow/ Sandstorm1604812128Infrared(primary
Experimental Design, Materials and Methods
4
We present a set of benchmark and baseline experiments aimed at investigating low- altitude bird and drone detection, as well as the classification of bird and drone trajectories. The evaluated algorithms were run in an environment equipped with an 8GB RTX 4070 GPU, 32GB RAM, and an Intel(R) Core(TM) i9–14900HX 2.20 GHz processor, implemented using the PyTorch deep learning framework. During the training of YOLOv11n, a batch size of 8 was used, with an initial learning rate set to 0.01, which was adjusted using a cosine annealing scheduler. The optimizer chosen was stochastic gradient descent (SGD), and the model was trained for 300 epochs.
(1) Data Splitting Strategy
To ensure the reliability and generalization of model evaluation, the LAT-BirdDrone dataset was divided into Training (Train), Validation (Val), and Test subsets at the video level, rather than at the trajectory or frame level, to prevent data leakage. The core principles and details of this strategy are as follows:
Segmentation levels and leakage prevention measures
4.1
All trajectories derived from a single video were assigned exclusively to one subset (Train/Val/Test). This approach ensured that no trajectories from the same video appeared in multiple subsets, thereby avoiding potential data leakage that could arise from overlapping trajectories with similar motion patterns or background contexts, which might bias the model's performance.
Split ratio and sampling method
4.2
A stratified sampling method was employed to maintain a consistent distribution of key attributes across the subsets. These attributes included target type (bird/drone), scenario (urban/farmland/lake), weather conditions (clear/rain/snow), and lighting conditions (dawn/dusk/daytime). The final split ratio was set at Train:Val:Test = 60 %:20 %:20 %, ensuring balanced representation:
- 1)Train Set: 39 videos (29 bird-only, 10 drone-only) accounting for 399 trajectories (210 bird, 189 drone);
- 2)Val Set: 13 videos (9 bird-only, 4 drone-only) accounting for 133 trajectories (70 bird, 63 drone);
- 3)Test Set: 13 videos (12 bird-only, 4 drone-only) accounting for 133 trajectories (70 bird, 63 drone).
The Test set includes 200 manually annotated ground truth trajectories, supplemented by additional frame-level annotations to facilitate accurate calculation of tracking metrics. This meticulous annotation ensures the precision and reliability of the evaluation metrics.
(2) Bird and Drone Detection
We employed the YOLOv11n algorithm to detect low-altitude small targets such as birds and drones [1]. YOLOv11n is the latest version in the YOLO series, specifically optimized for the efficient detection of small targets [2]. We collected a low-altitude flight video dataset containing various bird species and drones, under different weather, lighting, and background conditions. The dataset was annotated to ensure accurate bounding boxes and class labels for each target [[3], [4], [5]]. First, we preprocessed the dataset, including image resizing, normalization, and data augmentation, to improve the model’s generalization capability [[6], [7], [8]]. We then fine-tuned the pre-trained YOLOv11n model on our dataset [3,[9]](#bib0009). The training set consisted of 33,262 frames containing motion-blurred drone infrared thermal imagery and bird flight data. During training, we applied adaptive learning rates and mixed precision training to accelerate convergence and reduce GPU memory usage.
The detection results are shown in Fig. 12, Fig. 13.Fig. 12YOLOv11n low-altitude bird detection results.Fig 12. Fig. 13YOLOv11n drone detection results.Fig 13
(3) Trajectory Extraction Process and Algorithm Comparison Experiments
To construct a high-completeness low-altitude small target trajectory dataset, the choice of multi-object tracking (MOT) algorithm directly impacts the quality of the trajec- tories. To further enhance the accuracy and reliability of target trajectory extraction, we employed the ByteTrack algorithm [10]. ByteTrack achieves multi-object tracking by associating every detection box rather than merely high-confidence detection boxes, thereby maintaining high precision while providing excellent real-time performance. We selected ByteTrack due to its outstanding performance in managing occlusions and the reappearance of targets, which is particularly crucial for low-altitude micro-target tracking. The formula is as follows:
Let t represent the detection bounding box, represent the predicted tracking bounding box, represent the detection confidence, and and represent the weight coefficients.
The motion model algorithm of BoT-SORT (Kalman filter prediction) is as follows:
Let denote the state vector, F the state transition matrix, P the covariance matrix, the process noise, and Q the noise covariance.
Simultaneously, we also conducted comparative tests with the BoT-SORT algorithm [11], which integrates motion and appearance information along with camera motion compensation techniques. By comparing the performance of different algorithms based on metrics such as the number of trajectories, continuity, and ID stability, we ultimately selected the tracking strategy that best suits our dataset. Core component of BoT-SORT (Appearance Feature Similarity):
Let and represent the appearance embedding vectors of the detection and the trajectory, respectively.
A systematic comparison is conducted based on metrics such as the number of tra-jectories, continuity, and ID stability [[12], [13], [14], [15]], ultimately selecting the algorithm with the best tracking performance as the foundation for subsequent trajectory classification tasks. The lightweight YOLOv11n model (input resolution 640×640) is uniformly used to generate the initial detection results [[16], [17], [18], [19]], with a detection model confidence threshold of 0.15 and an IOU threshold set to 0.25 to ensure a high recall rate for small targets [[20], [21], [22]].
We quantify the algorithm’s performance on the test set (which contains 200 manually annotated ground truth trajectories) using the following metrics:
- 1)Number of trajectories: The total number of successfully associated complete trajec- tories (length ≥ 10 frames);
- 2)ID Switches (IDS): The number of times a single target is incorrectly assigned a new ID;
- 3)Fragmentation rate (Frag. Rate): The ratio of the number of trajectory breaks due to tracking failures to the total number of trajectories.
Let represent the number of breaks for the k-th trajectory, and repre- sent the total number of trajectories.
- MOTA (Multiple Object Tracking Accuracy): An overall accuracy metric that com- prehensively considers false positives, false negatives, and ID switches:
Where is the number of false negatives at frame t, is the number of false positives, is the number of ID switches, and is the number of ground truth targets.
We selected the same 25 videos to test the number of tracking trajectories of two algorithms. We also selected frames from the beginning, middle, and end of one video. Fig. 14, Fig. 15 show the infrared light crops and visible light crops of the bird tracking trajectory obtained by the ByteTrack algorithm, respectively.Fig. 14. Bird trajectory tracking results from infrared imaging.Fig 14. Fig. 15Bird trajectory tracking results from visible light imaging.Fig 15
ByteTrack significantly improves track completeness (+13.6 %) by retaining low-confidence detection boxes, demonstrating stronger robustness especially when targets are temporarily occluded (e.g., birds flying through branches). BoT-SORT compensates for motion model errors using appearance features, reducing ID switches by 44.4 %. However, its reliance on high-resolution ReID features makes it prone to feature degradation in low-altitude small target scenarios (average pixel area ≤100pt2), leading to missed tracks. Although BoT-SORT has a slightly higher MOTA, its computational overhead (average pro-cessing time of 38 ms per frame) is 72.7 % higher than ByteTrack (22 ms), making it difficult to meet the frame rate requirements of real-time monitoring systems (≥25FPS). Based on the core objective of “maximizing track sample cardinality”, this study selects ByteTrack as the basic tracking algorithm. To mitigate its ID switch issues, a two-stage post-processing approach is adopted:
- 1.Track interpolation smoothing: For tracks with short-term breaks (interval ≤5 frames), linear interpolation is performed based on a kinematic model (constant acceleration) to complete the tracks;
- 2.Manual verification and correction: The annotation team uses the CVAT tool to manually merge tracks with abnormal ID switches, ensuring ID consistency in the final dataset.
Following post-processing optimization, the dataset’s trajectory count increased to 665(comprising 350 avian and 315 unmanned aerial vehicle trajectories), with a mean duration of 48 frames. The fragmentation rate was reduced to 4.3 %, thus meeting the requirements for spatio-temporal feature modeling. This comparative experiment not only provides methodological support for dataset construction but also offers empirical reference for algorithm selection in low-altitude Multiple Object Tracking (MOT) scenarios.
(4) Baseline Benchmark Experiments to Demonstrate LAT-BirdDrone Utility
We conducted a comprehensive comparison of several mainstream models, including Convolutional Neural Networks (CNN) [23,[24]](#bib0025), Transformer models, improved Transformer models (iTransformer), Long Short-Term Memory networks (LSTM) [25,[26]](#bib0027), Bidirectional Long Short-Term Memory networks (BiLSTM) [25], and Gated Recurrent Units (GRU) [25,[26]](#bib0027). These models were applied to classification tasks involving birds [[27], [28], [29]] and drones. Initially, each of these models was trained independently and their performance was evaluated on the datasets. Subsequently, we tested existing hybrid architectures (e.g., Transformer+LSTM, CNN+BiLSTM) as baseline experiments, aiming to demonstrate the utility of LAT-BirdDrone in supporting multi-architecture model evaluation [30,[31]](#bib0032). The hybrid models included combinations such as Transformer+LSTM, CNN+BiLSTM, iTransformer+CNN, Transformer+LSTM+CNN, and iTransformer+CNN+GRU. The performance results of these models are detailed in Table 4. Furthermore, we compiled statistics on the correctly and incorrectly classified trajectories for both birds and UAVs, as presented in Table 3. This comprehensive analysis provided insights into the effectiveness of different model architectures and their combinations for the specified classification tasks.Table 2. Comparative experimental results of ByteTrack algorithm and BoT-SORT algorithm.Table 2. IndexByteTrackBoT-SORTTrack Count184162ID Switch2715Frag.Rate18.212.7MOTA72.475.8Note: Table 2 demonstrates a performance comparison between the ByteTrack and BoT-SORT algorithms under the same detection input, focusing on metrics including track count, ID Switch, frag.rate, and MOTA.Table 3. Baseline benchmark of trajectory classification results (birds vs. drones) on LAT-BirdDrone.Table 3. TypeModelCorrect TrajectoriesIncorrect TrajectoriesBirdCNNTransformerLSTMiTransformerLSTMTransCNNBiLSTMCNNiTransCNNLSTMTransCNNGRUiTrans312299285320293320330304****3004535493741372730****34DroneCNNTransformerLSTMiTransformerLSTMTransCNNBiLSTMCNNiTransCNNLSTMTransCNNGRUiTrans294275261257284287283307****3091557715248222625****23Table 4. Baseline benchmark of trajectory classification performance metrics on LAT-BirdDrone.Table 4. TypeDatasetModelAccuracyPrecisionRecalF1ROC-AUCBirdDroneValTestCNNTransformerLSTMiTransformerLSTMTransCNNBiLSTMCNNiTransCNNLSTMTransCNNGRUiTransCNNTransformerLSTMiTransformerLSTMTransCNNBiLSTMCNNiTransCNNLSTMTransCNNGRUiTrans90.6682.8381.3385.5484.9490.9690.0689.76****89.7690.9986.1981.9886.6486.6491.1492.0491.74****91.4492.9082.1880.5784.7884.0092.4489.0793.55****93.5595.4183.9980.0686.0285.9293.5792.7092.40****92.8889.2084.6283.4388.6486.9890.3492.6185.80****85.8087.3989.5285.3389.6487.7289.6492.4491.02****89.8291.1083.3881.9886.6785.4791.3890.8189.51****89.5191.2386.6782.6187.7986.8191.5692.5791.70****91.3297.1091.9389.8592.4191.7797.3597.1095.57****95.3998.3693.6091.3092.8693.7997.9798.2197.75****97.81ValTestCNNTransformerLSTMiTransformerLSTMTransCNNBiLSTMCNNiTransCNNLSTMTransCNNGRUiTransCNNTransformerLSTMiTransformerLSTMTransCNNBiLSTMCNNiTransCNNLSTMTransCNNGRUiTrans90.6682.8381.3385.5484.9490.9690.0689.76****89.7690.6686.1981.9886.6486.6491.1492.0491.74****91.4488.3483.5482.1786.4985.9989.3891.2886.44****86.4486.7388.7184.1987.4187.3888.5891.2991.10****90.0992.3180.9879.1482.0582.8291.6787.1893.87****93.8795.1582.8378.6183.1785.5492.8891.5992.47****93.0790.2882.2480.6384.2184.3890.5189.1890.00****90.0090.7485.6781.3185.2486.4590.6891.4491.78****91.5697.1091.9389.8592.4191.7797.3597.9195.57****95.3998.3693.6091.3092.8693.79**97.9798.2197.75**97.81Note: Val and Test sets are split at the video level (stratified sampling, Train: Val:Test = 60 %:20 %:20 %) to avoid data leakage; see “Data Splitting Strategy” for details. Trajectory statistics (e.g., count by species/model) are provided in Table 1.
Firstly, in terms of multi-architecture model support, the baseline experiments encompassed single models (such as CNN, Transformer, LSTM) and hybrid architectures (such as CNN+BiLSTM, Transformer+LSTM). All models were able to produce effective results based on the LAT-BirdDrone dataset, demonstrating that the dataset can serve as a unified benchmark for “low-altitude micro-target trajectory classification,” accommodating evaluation needs for different types of models. Secondly, regarding high annotation quality and reliability, after optimization with ByteTrack, the dataset's trajectory fragmentation rate was reduced to 4.3 %. The 665 complete trajectories enabled models (such as CNNiTrans) to achieve an accuracy of 92.04 % and a ROC-AUC of 98.21 % on the test set, indicating that the dataset's trajectory annotations have high continuity and class separability, providing a reliable basis for model training and testing. Finally, with respect to complex scene coverage and practicality, the dataset includes six types of backgrounds (urban, farmland, lake), special lighting conditions (dawn/dusk), and adverse weather (rain, snow, sandstorm). The baseline results showed that hybrid models performed consistently well on samples with complex backgrounds, demonstrating that the dataset can simulate real low-altitude monitoring environments, thus supporting subsequent applications of models in real-world scenarios.
(5) Trajectory Classification Visualization
We used the Transformer+LSTM+CNN hybrid model, which demonstrated high classification accuracy, to classify birds and drones, and obtained relevant images of correct and incorrect trajectories. Trajectory classification visualization and velocity distributionhistograms aid in analyzing classification errors and optimizing models. Trajectory classificationvisualization demonstrates spatial differences between correctly and incorrectly classified samples, typically represented on a 2D plane. It reveals motion patterns misclassi- fied by the model, such as rapidly turning birds misidentified as drones. This visualization guides feature engineering, especially for easily misclassified patterns (Fig. 16a). Speed-time curves provide a temporal analysis of motion patterns by visualizing speed variations over time, offering critical insights into classification performance across different object categories. This analytical approach reveals characteristic movement signatures: drones typically exhibit intermittent high-speed bursts with rapid acceleration phases, while birds demonstrate more stable speed profiles with periodic flapping-induced fluctuations (Fig. 16b).Fig. 16. Transformer+LSTM+CNN bird and drone trajectory density maps.Fig 16
Trajectory density maps use color gradients to show sample distribution density, revealing hotspots with high misclassification rates. These provide reference for model optimization, particularly significant for improvements in difficult-to-classify areas (Fig. 17).Fig. 17. Transformer+LSTM+CNN bird and drone trajectory density maps.Fig 17
This polar histogram (Fig. 18) visually depicts the directional concentration trends in the trajectories of birds and drones. By mapping angular distributions, it reveals the dominant movement directions and potential orientation preferences inherent in their flight patterns.Fig. 18. Direction distribution diagram (polar coordinates).Fig 18
This two-dimensional heatmap (Fig. 19) correlates spatial positions with curvature values to identify regions of high curvature—such as sharp turns or hovering behavior—in the trajectories of birds and drones. Intense color gradients highlight areas where rapid changes in direction occur, providing insights into maneuvering strategies and environmental interactions.Fig. 19. Curvature heatmap.Fig 19
The frequency spectrum diagram (Fig. 20) identifies periodic components in both normal and abnormal trajectories of drones. Frequency peaks correspond to specific cycle engths, allowing the detection of rhythmic patterns, mechanical vibrations (e.g., propeller- induced oscillations), or systematic errors that may indicate malfunctions or deviations from standard flight paths.Fig. 20. Fourier spectrum analysis.Fig 20
Dataset limitations
This study exhibits several limitations that warrant in-depth discussion, as these constraints not only reflect potential gaps in the current work but also provide actionable directions for future dataset optimization and model refinement. Firstly, the LAT-BirdDrone dataset presents a notable class imbalance due to the unequal ratio of drone-to-bird videos, with 15 drone-only videos and 50 bird-only videos resulting in a ratio of 1:3.3; this imbalance directly impacts the training and validation of trajectory classification models. During training, the model tends to prioritize learning features of the majority class (birds) to minimize overall loss, leading to biased generalization—while it may achieve high accuracy in bird trajectory classification, the misclassification rate for the minority class (drones) increases significantly. For instance, experimental results of single models (Table 3) show that the CNN model misclassified only 15 drone trajectories but 45 bird trajectories, partially reflecting the model’s over-adaptation to bird features. In validation, the imbalanced sample distribution may overestimate the model’s overall performance: since bird samples account for a larger proportion of the validation set, metrics such as accuracy and F1-score are skewed toward bird classification results, failing to truly reflect the model’s ability to recognize drone trajectories—a key target for low-altitude security applications. Future solutions to this issue include augmenting drone video data via techniques like frame interpolation or synthetic trajectory generation, adopting class-balanced loss functions (e.g., focal loss) to adjust the weight of minority-class samples during training, and collecting additional drone flight videos to balance the sample ratio.
Secondly, the dataset’s limited diversity in target types—covering only 4 drone models (DJI Air, DJI Inspire, Mavic Pro, DJI Mini) and 4 bird species (crow, sparrow, goose, magpie)—restricts its generalization ability and that of the trained models to real-world low-altitude targets. For drones, the selected models are all consumer-grade mainstream products from DJI, excluding other types of low-altitude small targets such as industrial micro-drones (e.g., Parrot Anafi) or custom-made mini-drones; these unincluded drones may exhibit distinct motion patterns (e.g., slower hovering speeds for industrial drones) or appearance features (e.g., different numbers of propellers), which could reduce the model’s classification accuracy when encountering such targets. For birds, the 4 selected species were chosen for their visual similarity to drones but do not cover small birds (e.g., swallows) or large birds of prey (e.g., eagles) common in low-altitude environments; these excluded birds have more diverse flight dynamics (e.g., gliding in eagles vs. flapping in sparrows) that the current dataset fails to capture, limiting the model’s adaptability to varied avian motion patterns. Subsequent dataset updates should expand the range of target types to include non-DJI drone models and additional bird species, thereby enhancing the dataset’s representativeness for real low-altitude monitoring scenarios.
Thirdly, the dataset’s mean trajectory length of 48 frames (equivalent to ∼1.92 s at 25 fps) poses challenges for temporal modeling of extremely short or extremely long trajectories. For extremely short trajectories (e.g., <10 frames, ∼0.4 s), the limited number of frames fails to capture sufficient spatiotemporal features (e.g., short-term acceleration or direction changes of targets), leading to high fragmentation rates even after post-processing (current fragmentation rate: 4.3 %) and increasing the model’s difficulty in distinguishing between transient drone movements (e.g., sudden takeoffs) and bird movements (e.g., quick escapes). For extremely long trajectories (e.g., >100 frames, ∼4 s), the current mean length does not cover long-duration motion patterns (e.g., continuous cruising of drones or long-distance gliding of birds); models relying on temporal dependencies (e.g., LSTM, Transformer) may suffer from long-sequence information degradation—for example, the gradient vanishing issue in LSTM could weaken its ability to capture long-term motion trends of targets. Future improvements to address this limitation include collecting trajectory data of varying durations to cover both short (<10 frames) and long (>100 frames) sequences, optimizing trajectory segmentation strategies (e.g., sliding window methods) to adapt to long sequences, and adopting models specialized in long-sequence modeling (e.g., Transformer-XL) to complement the current temporal modeling framework.
Conclusions
Although the LAT-BirdDrone dataset provides an important resource for trajectory classification of low-altitude flying small targets, we recognize that the dataset still has certain limitations in terms of scale. The dataset used in this study comprises 65 videos (15 of drones and 50 of birds) and 665 trajectories, with an average of 48 frames per trajectory. The relatively small number of videos and trajectories may limit the ability to achieve high-precision classification.To overcome these limitations, we plan to expand the LAT-BirdDrone dataset by increasing the scale and diversity of data collection, involving more types of drones and birds. Additionally, we aim to include more diverse target types, encompassing low-altitude flying objects of different sizes, shapes, and motion characteristics. We will also gather data from more complex scenes, such as adverse weather conditions and intricate background environments, and extend the trajectory lengths beyond the current average of 48 frames to support more complex temporal modeling. Moreover, we will consider adding multi-perspective data to provide more comprehensive information about target movements.
These efforts will help to better validate and improve trajectory classification models and advance the field of low-altitude small target monitoring. Through collaborative efforts and the inclusion of diverse data sources, we believe that we can construct a more comprehensive and practically valuable dataset.
Limitations
None
Ethics Statement
We hereby confirm that the authors have read and follow the ethical requirements for publication in Data in Brief. Additionally, we confirm that the current work does not involve human subjects, animal experiments, or any data collected from social media platforms.
CRediT Author Statement
Ming Ke: Conceptualization, Methodology, Software; Xin Kang: Data curation, Writing, Original draft preparation; Zixuan Zhang: Visualization, Investigation; Lubing Wang: Supervision; Gang Wang: Software, Validation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Zhang Z.Drone-YOLO: an efficient neural network method for target detection in Drone images Drones 78202310.3390/drones 7080526526-526 · doi ↗
- 2Wang Z.Su Y.Kang F.Wang L.Lin Y.Wu Q.Li H.Cai Z.PC-YOLO 11s: a lightweight and effective feature extraction method for small target image detection Sens. (Basel Switz.)252202510.3390/s 25020348 PMC 1176838439860718 · doi ↗ · pubmed ↗
- 3Al-Masni M.A.Al-Antari M.A.Park J.M.Gi G.Kim T.Y.Rivera P.Valarezo E.Choi M.T.Han S.M.Kim T.S.Simultaneous detection and classification of breast masses in digital mammograms via a deep learning YOLO-based CAD system Comput. Methods Programs Biomed.1572018859410.1016/j.cmpb.2018.01.01729477437 · doi ↗ · pubmed ↗
- 4Feuer B.Joshi A.Cho M.Chiranjeevi S.Deng Z.Balu A.Singh A.Sarkar S.Merchant N.Singh A.Ganapathysubramanian B.Hegde C.Zero-shot insect detection via weak language supervision Plant Phenome J.71202410.1002/ppj 2.20107 · doi ↗
- 5Hasan A.Diepeveen D.Laga H.Jones M.Sohel F.Object-level benchmark for deep learning-based detection and classification of weed species Crop Prot.177202310.1016/j.cropro.2023.106561106561-106561 · doi ↗
- 6Islam M.Rahman M.Ali M.Nafi A.Alam M.Godder T.Miah M.Islam M.Enhancing breast cancer segmentation and classification: an ensemble Deep Convolutional Neural network and U-net approach on ultrasound images Mach. Learn. Appl.16202410.1016/j.mlwa.2024.100555100555-100555 · doi ↗
- 7Himel G.M.S.Islam M.M.Al-Aff K.A.Karim S.I.Sikder M.K.U.Skin cancer segmentation and classification using vision transformer for automatic analysis in dermatoscopy-based noninvasive digital system Int. J. Biomed. Imaging 2024302219210.1155/2024/30221922024 PMC 1085879738344227 · doi ↗ · pubmed ↗
- 8Ul Rehman S.Tarek N.Magdy C.Kamel M.Abdelhalim M.Melek A.N Mahmoud L.Sadek I.AI-based tool for early detection of Alzheimer's disease Heliyon 1082024 e 2937510.1016/j.heliyon.2024.e 29375 PMC 1103312838644855 · doi ↗ · pubmed ↗
