Object Detection in the DCT Domain: is Luminance the Solution?

Benjamin Deguerre; Clement Chatelain; Gilles Gasso

arXiv:2006.05732·cs.CV·July 15, 2021

Object Detection in the DCT Domain: is Luminance the Solution?

Benjamin Deguerre, Clement Chatelain, Gilles Gasso

PDF

3 Repos

TL;DR

This paper explores object detection directly in the JPEG compressed domain, focusing on luminance information, achieving faster detection with minimal performance loss compared to RGB-based methods.

Contribution

It introduces detection architectures tailored for JPEG compression, demonstrating that luminance alone can suffice for accurate object detection, thus improving efficiency.

Findings

01

Achieves 1.7x speed-up over RGB-based detection architectures.

02

Luminance component alone can match full JPEG detection accuracy.

03

Reduces detection performance by only 5.5% when using luminance only.

Abstract

Object detection in images has reached unprecedented performances. The state-of-the-art methods rely on deep architectures that extract salient features and predict bounding boxes enclosing the objects of interest. These methods essentially run on RGB images. However, the RGB images are often compressed by the acquisition devices for storage purpose and transfer efficiency. Hence, their decompression is required for object detectors. To gain in efficiency, this paper proposes to take advantage of the compressed representation of images to carry out object detection usable in constrained resources conditions. Specifically, we focus on JPEG images and propose a thorough analysis of detection architectures newly designed in regard of the peculiarities of the JPEG norm. This leads to a $\times 1.7$ speed up in comparison with a standard RGB-based architecture, while only reducing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings