# PDGV-DETR: Object Detection for Secure On-Site Weapon and Personnel Location Based on Dynamic Convolution and Cross-Scale Semantic Fusion

**Authors:** Nianfeng Li, Peizeng Xin, Jia Tian, Xinlu Bai, Hongjie Ding, Zhiguo Xiao, Qian Liu

PMC · DOI: 10.3390/s26051542 · Sensors (Basel, Switzerland) · 2026-02-28

## TL;DR

This paper introduces PDGV-DETR, a new object detection framework optimized for detecting weapons and personnel in security surveillance images with high accuracy and robustness.

## Contribution

The novel contribution is PDGV-DETR, which uses dynamic convolution and cross-scale fusion to improve detection accuracy and robustness in complex security scenarios.

## Key findings

- PDGV-DETR achieved an mAP50 of 85.9% on a conflict scene dataset, outperforming RT-DETR with a p-value less than 0.01.
- On the OD-WeaponDetection dataset, PDGV-DETR reached 93.0% mAP for gun and knife detection, a 2.2% improvement over RT-DETR.
- The model improved detection accuracy by 15.1% compared to deformable DETR for personnel object localization.

## Abstract

In public safety scenarios, the precise detection and positioning of prohibited weapons such as firearms and knives along with the involved personnel are the core pre-requisite technologies for violent risk warning and emergency response. However, in security surveillance scenarios, there are common problems such as object occlusion, difficulty in capturing small-sized weapons, and complex background interference, which lead to the shortcomings of existing general object detection models in the tasks of detecting and locating security-related objects, including poor adaptability, low detection accuracy, and insufficient robustness in complex scenarios. Therefore, this paper proposes a threat object detection framework for security scenarios (PDGV-DETR) based on adaptive dynamic convolution and cross-scale semantic fusion, specifically optimized for the detection and positioning tasks of weapons and personnel objects in static security surveillance images. This research focuses on category recognition at the object level and pixel-level spatial positioning, and does not involve the classification and identification of violent behaviors based on temporal information. There are clear technical boundaries and scene limitations between the two. This framework is optimized through three core modules: designing a dynamic hierarchical channel interaction convolution module to reduce computational complexity while enhancing the ability to detect occluded and incomplete objects; constructing an improved bidirectional hybrid feature pyramid network, combining the cross-scale fusion module to strengthen multi-scale feature expression, and adapting to the simultaneous detection requirements of small weapon objects and large personnel objects; and introducing a global semantic weaving and elastic feature alignment network to solve the problem of low discrimination between objects and complex backgrounds. Under the same experimental configuration, the proposed model is verified against current mainstream models on typical datasets: on a dataset of 2421 conflict scene personnel violent images, the peak average precision mAP50 of PDGV-DETR reached 85.9%. Through statistical verification, compared with the baseline model RT-DETR with an average value ± standard deviation of 0.840 ± 0.007, the average value ± standard deviation of PDGV-DETR reached 0.858 ± 0.004, demonstrating statistically significant performance improvement, with a p-value less than 0.01. This model can accurately complete the task of locating the object area of personnel, and compared with the deformable DETR, the accuracy improvement rate reached 15.1%.; on the weapon-specific dataset OD-WeaponDetection, the mAP for gun and knife detection reached 93.0%, improving by 2.2% compared to RT-DETR. Compared to the performance fluctuations of other general object detection models in complex security scenarios, PDGV-DETR not only has better detection and positioning accuracy for security-related objects, but also significantly improves the generalization and stability of the model. The results show that PDGV-DETR effectively balances the accuracy of positioning, detection, and computational efficiency, accurately completing end-to-end detection and positioning of weapon and personnel objects in static security surveillance images, demonstrating highly competitive performance in the detection and positioning of security-related objects in security scenes, providing core object-level pre-processing technology support for scenarios such as public area monitoring, intelligent video monitoring, and early warning of violent risks, and providing basic data for subsequent violent behavior recognition based on temporal data.

## Full-text entities

- **Diseases:** violent (MESH:D001523)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12987155/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12987155/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12987155/full.md

---
Source: https://tomesphere.com/paper/PMC12987155