# SFE-DETR: An Enhanced Transformer-Based Face Detector for Small Target Faces in Open Complex Scenes

**Authors:** Chenhao Yang, Yueming Jiang, Chunyan Song

PMC · DOI: 10.3390/s26010125 · Sensors (Basel, Switzerland) · 2025-12-24

## TL;DR

This paper introduces SFE-DETR, a new face detection model that improves accuracy and efficiency for detecting small faces in complex scenes.

## Contribution

SFE-DETR introduces a novel architecture with enhanced feature preservation and fusion for efficient small face detection.

## Key findings

- SFE-DETR reduces model parameters by 28.1% compared to RT-DETR-R18 while maintaining high accuracy.
- It achieves a mAP50 of 94.7% on SCUT-HEAD and 86.3% on WIDER FACE (Hard), outperforming similar models.
- The model's multi-scale self-attention and redesigned FPN improve small face detection performance significantly.

## Abstract

Face detection is an important task in the field of computer vision and is widely applied in various applications. However, in open and complex scenes with dense faces, occlusions, and image degradation, small face detection still faces significant challenges due to the extremely small target scale, difficult localization, and severe background interference. To address these issues, this paper proposes a small face detector for open complex scenes, SFE-DETR, which aims to simultaneously improve detection accuracy and computational efficiency. The backbone network of the model adopts an inverted residual shift convolution and dilated reparameterization structure, which enhances shallow features and enables deep feature self-adaptation, thereby better preserving small-scale information and reducing the number of parameters. Additionally, a multi-head multi-scale self-attention mechanism is introduced to fuse multi-scale convolutional features with channel-wise weighting, capturing fine-grained facial features while suppressing background noise. Moreover, a redesigned SFE-FPN introduces high-resolution layers and incorporates a novel feature fusion module consisting of local, large-scale, and global branches, efficiently aggregating multi-level features and significantly improving small face detection performance. Experimental results on two challenging small face detection datasets show that SFE-DETR reduces parameters by 28.1% compared to the original RT-DETR-R18 model, achieving a mAP50 of 94.7% and AP-s of 42.1% on the SCUT-HEAD dataset, and a mAP50 of 86.3% on the WIDER FACE (Hard) subset. These results demonstrate that SFE-DETR achieves optimal detection performance among models of the same scale while maintaining efficiency.

## Full-text entities

- **Genes:** MME (membrane metalloendopeptidase) [NCBI Gene 4311] {aka CALLA, CD10, CMT2T, NEP, SCA43, SFE}
- **Diseases:** MHMSA (MESH:D015161), injury to (MESH:D014947)
- **Chemicals:** DETR (MESH:C035773), DETE-R18 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12787577/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12787577/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12787577/full.md

---
Source: https://tomesphere.com/paper/PMC12787577