# LDA-DETR: A lightweight dynamic attention-enhanced DETR for small object detection

**Authors:** Yanli Shi, Jing Li, Yi Jia, Qihua Hong

PMC · DOI: 10.1371/journal.pone.0340977 · PLOS One · 2026-01-30

## TL;DR

This paper introduces LDA-DETR, a lightweight and efficient object detection model that improves performance on small objects using dynamic attention and multi-scale feature fusion.

## Contribution

The novel LDA-DETR model combines a lightweight backbone with dynamic attention and multi-scale fusion for efficient small object detection.

## Key findings

- LDA-DETR achieves better performance on small object detection compared to state-of-the-art methods.
- The proposed modules improve feature representation and model efficiency under limited computational resources.
- Experiments on multiple datasets validate the effectiveness of the approach for real-world applications.

## Abstract

The issues of complex background interference, dense distribution, and insufficient feature representation for small objects have become significant challenges and research hotspots in computer vision. Particularly when the algorithm needs to be deployed in practical applications, many state-of-the-art detectors struggle to balance accuracy and efficiency, often requiring extensive computational power or suffering from degraded detection performance on small objects. To tackle these problems, this paper proposes a lightweight dynamic attention-enhanced DETR (LDA-DETR). Firstly, a lightweight feature extraction backbone (LFEB) is designed to improve the efficiency of object detection under limited computational resources. The proposed backbone enhances gradient flow and reduces the model’s parameters through residual structures and partial convolution operations. Then, a Dynamic Multi-Scale Fusion Module (DMSFM) is proposed to improve the model’s adaptability and the ability to fuse diverse features. The proposed module enhances feature representation ability and inference performance by performing convolutions at different scales across multiple branches and dynamically selecting operations. Finally, considering shallow features contain more detailed information, the Attention-Enhanced Fusion Network (AEFN) is constructed. The proposed approach refines and enriches features through attention mechanisms and cascading operations, endowing the features with comprehensive semantic and spatial details. Extensive experiments on the RSOD, NWPU VHR-10, URPC2020, and VisDrone-DET datasets demonstrate that LDA-DETR outperforms the state-of-the-art detection methods and further validate that the technique is better suited for small object detection applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12858003/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12858003/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12858003/full.md

---
Source: https://tomesphere.com/paper/PMC12858003