Real-Time Indoor Object Detection based on hybrid CNN-Transformer   Approach

Salah Eddine Laidoudi; Madjid Maidi; Samir Otmane

arXiv:2409.01871·cs.CV·September 4, 2024·2 cites

Real-Time Indoor Object Detection based on hybrid CNN-Transformer Approach

Salah Eddine Laidoudi, Madjid Maidi, Samir Otmane

PDF

Open Access

TL;DR

This paper introduces a hybrid CNN-Transformer approach for real-time indoor object detection, addressing challenges like variable lighting and complex backgrounds, and presents a new dataset tailored for indoor categories.

Contribution

The study develops a refined indoor dataset and adapts a CNN model with attention mechanisms, improving accuracy and speed for real-time indoor object detection.

Findings

01

Competitive accuracy and speed with state-of-the-art models

02

Enhanced feature prioritization in cluttered scenes

03

New dataset tailored for indoor object detection

Abstract

Real-time object detection in indoor settings is a challenging area of computer vision, faced with unique obstacles such as variable lighting and complex backgrounds. This field holds significant potential to revolutionize applications like augmented and mixed realities by enabling more seamless interactions between digital content and the physical world. However, the scarcity of research specifically fitted to the intricacies of indoor environments has highlighted a clear gap in the literature. To address this, our study delves into the evaluation of existing datasets and computational models, leading to the creation of a refined dataset. This new dataset is derived from OpenImages v7, focusing exclusively on 32 indoor categories selected for their relevance to real-world applications. Alongside this, we present an adaptation of a CNN detection model, incorporating an attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Visual Attention and Saliency Detection

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings