UniHead: Unifying Multi-Perception for Detection Heads

Hantao Zhou; Rui Yang; Yachao Zhang; Haoran Duan; Yawen Huang; Runze; Hu; Xiu Li; Yefeng Zheng

arXiv:2309.13242·cs.CV·June 11, 2024·1 cites

UniHead: Unifying Multi-Perception for Detection Heads

Hantao Zhou, Rui Yang, Yachao Zhang, Haoran Duan, Yawen Huang, Runze, Hu, Xiu Li, Yefeng Zheng

PDF

Open Access 1 Repo

TL;DR

UniHead is a unified detection head that enhances deformation, global, and cross-task perception, significantly improving object detection performance across multiple models by integrating innovative transformer-based modules.

Contribution

The paper introduces UniHead, a novel detection head that simultaneously unifies deformation, global, and cross-task perception using transformer modules, advancing detection capabilities.

Findings

01

Achieves +2.7 AP on RetinaNet

02

Achieves +2.9 AP on FreeAnchor

03

Achieves +2.1 AP on GFL

Abstract

The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception, global perception and cross-task perception. Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we develop an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach (1) introduces deformation perception, enabling the model to adaptively sample object features; (2) proposes a Dual-axial Aggregation Transformer (DAT) to adeptly model long-range dependencies, thereby achieving global perception; and (3) devises a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zht8506/unihead
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Tactile and Sensory Interactions · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · 1x1 Convolution · Convolution · Feature Pyramid Network · Focal Loss · RetinaNet · Layer Normalization · Label Smoothing · Dropout