# BNE-DETR: Nighttime Pedestrian Detection with Visible Light Sensors via Feature Enhancement and Multi-Scale Fusion

**Authors:** Fu Li, Yan Lu, Ming Zhao, Wangyu Wu

PMC · DOI: 10.3390/s26010260 · Sensors (Basel, Switzerland) · 2025-12-31

## TL;DR

This paper introduces BNE-DETR, a model for better pedestrian detection at night using visible light sensors through feature enhancement and multi-scale fusion.

## Contribution

The novel BNE-DETR model improves nighttime pedestrian detection by integrating SECG, AIFI-SEFN, and MANStar modules for enhanced feature representation.

## Key findings

- BNE-DETR improves precision, recall, and mAP50 by 1.9%, 2.5%, and 1.9% respectively on the LLVIP dataset.
- The model maintains low computational complexity (48.7 GFLOPs) and reduces parameters by 20.2%.
- Cross-dataset experiments confirm robust performance and generalization in nighttime detection tasks.

## Abstract

Pedestrian detection faces significant performance degradation challenges in nighttime visible light environments due to degraded target features, background noise interference, and the coexistence of multi-scale targets. To address this issue, this paper proposes a BNE-DETR model based on an improved RT-DETR. First, we incorporate the lightweight backbone network CSPDarknet and design a Single-head Self-attention with EPGO and Convolutional Gated Linear Unit (SECG) module to replace the bottleneck layer in the original C2f component. By integrating single-head self-attention, the Efficient Prompt Guide Operator (EPGO) dynamic K-selection mechanism, and convolutional gated linear units, it effectively enhances the model’s feature representation capability under low-light conditions. Second, the AIFI-SEFN module, which combines Attention-driven Intra-scale Feature Interaction (AIFI) with a Spatially Enhanced Feedforward Network (SEFN), is constructed to strengthen the extraction of weak details and the fusion of contextual information. Finally, the Mixed Aggregation Network with Star Blocks (MANStar) module utilizes large-kernel convolutions and multi-branch star structures to enhance the representation and fusion of multi-scale pedestrian features. Experiments on the LLVIP dataset demonstrate that our model achieves 1.9%, 2.5%, and 1.9% improvements in Precision, Recall, and mAP50, respectively, compared to RT-DETR-R18, while maintaining low computational complexity (48.7 GFLOPs) and reducing parameters by 20.2%. Cross-dataset experiments further validate the method’s robust performance and generalization capabilities in nighttime pedestrian detection tasks.

## Full-text entities

- **Genes:** EMG1 (EMG1 N1-specific pseudouridine methyltransferase) [NCBI Gene 10436] {aka C2F, Grcc2f, NEP1}
- **Diseases:** injury to (MESH:D014947), traffic accidents (MESH:D000081084)
- **Chemicals:** GFLOPS (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12788328/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12788328/full.md

## References

69 references — full list in the complete paper: https://tomesphere.com/paper/PMC12788328/full.md

---
Source: https://tomesphere.com/paper/PMC12788328