Transformer-based Context Condensation for Boosting Feature Pyramids in   Object Detection

Zhe Chen; Jing Zhang; Yufei Xu; Dacheng Tao

arXiv:2207.06603·cs.CV·August 29, 2023

Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection

Zhe Chen, Jing Zhang, Yufei Xu, Dacheng Tao

PDF

Open Access

TL;DR

This paper introduces a lightweight Transformer-based context condensation module that enhances feature pyramid fusion in object detection, improving accuracy and reducing computational costs across multiple detectors.

Contribution

It proposes a novel context modeling mechanism with local and global representations, integrated with a Transformer decoder, to boost feature fusion efficiency and effectiveness.

Findings

01

Improves detection accuracy by up to 7.8% AP on MS COCO

02

Reduces computational complexity by around 20% GFLOPs

03

Compatible with multiple feature pyramid methods

Abstract

Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detection performance. However, they usually require heavy cross-level connections or iterative refinement to obtain better MFF result, making them complicated in structure and inefficient in computation. To address these issues, we propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results while reducing the computational costs effectively. In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency. The two representations include a locally concentrated representation and a globally summarized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Absolute Position Encodings · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Adam