# HAR-Net: Joint Learning of Hybrid Attention for Single-stage Object   Detection

**Authors:** Ya-Li Li, Shengjin Wang

arXiv: 1904.11141 · 2020-02-19

## TL;DR

HAR-Net introduces a hybrid attention mechanism combining spatial, channel, and aligned attention modules, significantly enhancing single-stage object detection accuracy and achieving state-of-the-art results on COCO dataset.

## Contribution

The paper proposes a novel hybrid attention mechanism and integrates it into a single-stage detector, HAR-Net, to improve detection performance.

## Key findings

- Hybrid attention significantly improves detection accuracy.
- HAR-Net achieves 45.8% mAP on COCO, outperforming existing methods.
- The integrated attention modules enhance feature representation.

## Abstract

Object detection has been a challenging task in computer vision. Although significant progress has been made in object detection with deep neural networks, the attention mechanism is far from development. In this paper, we propose the hybrid attention mechanism for single-stage object detection. First, we present the modules of spatial attention, channel attention and aligned attention for single-stage object detection. In particular, stacked dilated convolution layers with symmetrically fixed rates are constructed to learn spatial attention. The channel attention is proposed with the cross-level group normalization and squeeze-and-excitation module. Aligned attention is constructed with organized deformable filters. Second, the three kinds of attention are unified to construct the hybrid attention mechanism. We then embed the hybrid attention into Retina-Net and propose the efficient single-stage HAR-Net for object detection. The attention modules and the proposed HAR-Net are evaluated on the COCO detection dataset. Experiments demonstrate that hybrid attention can significantly improve the detection accuracy and the HAR-Net can achieve the state-of-the-art 45.8\% mAP, outperform existing single-stage object detectors.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.11141/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1904.11141/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/1904.11141/full.md

---
Source: https://tomesphere.com/paper/1904.11141