Hyneter: Hybrid Network Transformer for Object Detection

Dong Chen; Duoqian Miao; Xuerong Zhao

arXiv:2302.09365·cs.CV·February 21, 2023

Hyneter: Hybrid Network Transformer for Object Detection

Dong Chen, Duoqian Miao, Xuerong Zhao

PDF

Open Access

TL;DR

Hyneter is a hybrid vision Transformer that combines local and global features to improve object detection, especially for small objects, by integrating CNN and Transformer components.

Contribution

It introduces a Hybrid Network Transformer with a novel backbone and dual switching module to better fuse local and global information in object detection.

Findings

01

Improved detection accuracy for small objects.

02

Effective integration of CNN and Transformer features.

03

Enhanced balance between local and global information.

Abstract

In this paper, we point out that the essential differences between CNN-based and Transformer-based detectors, which cause the worse performance of small objects in Transformer-based methods, are the gap between local information and global dependencies in feature extraction and propagation. To address these differences, we propose a new vision Transformer, called Hybrid Network Transformer (Hyneter), after pre-experiments that indicate the gap causes CNN-based and Transformer-based methods to increase size-different objects result unevenly. Different from the divide and conquer strategy in previous methods, Hyneters consist of Hybrid Network Backbone (HNB) and Dual Switching module (DS), which integrate local information and global dependencies, and transfer them simultaneously. Based on the balance strategy, HNB extends the range of local information by embedding convolution layers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Infrared Target Detection Methodologies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Dense Connections · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Dropout · Byte Pair Encoding