Context Enhanced Transformer for Single Image Object Detection

Seungjun An; Seonghoon Park; Gyeongnyeon Kim; Jeongyeol Baek,; Byeongwon Lee; Seungryong Kim

arXiv:2312.14492·cs.CV·December 27, 2023·1 cites

Context Enhanced Transformer for Single Image Object Detection

Seungjun An, Seonghoon Park, Gyeongnyeon Kim, Jeongyeol Baek,, Byeongwon Lee, Seungryong Kim

PDF

Open Access

TL;DR

This paper introduces CETR, a novel single image object detection method that incorporates temporal context through a class-wise memory module and test-time adaptation, improving detection efficiency and accuracy.

Contribution

The paper proposes CETR, a new approach that integrates temporal context into single image detection using a memory module and test-time adaptation, reducing complexity of existing methods.

Findings

01

Effective incorporation of temporal context improves detection accuracy.

02

Memory module efficiently stores and utilizes temporal information.

03

Test-time adaptation enhances performance on video datasets.

Abstract

With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Video Surveillance and Tracking Methods

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dropout · Layer Normalization · Byte Pair Encoding · Softmax · Multi-Head Attention