Real-Time Indoor Object Detection based on hybrid CNN-Transformer Approach
Salah Eddine Laidoudi, Madjid Maidi, Samir Otmane

TL;DR
This paper introduces a hybrid CNN-Transformer approach for real-time indoor object detection, addressing challenges like variable lighting and complex backgrounds, and presents a new dataset tailored for indoor categories.
Contribution
The study develops a refined indoor dataset and adapts a CNN model with attention mechanisms, improving accuracy and speed for real-time indoor object detection.
Findings
Competitive accuracy and speed with state-of-the-art models
Enhanced feature prioritization in cluttered scenes
New dataset tailored for indoor object detection
Abstract
Real-time object detection in indoor settings is a challenging area of computer vision, faced with unique obstacles such as variable lighting and complex backgrounds. This field holds significant potential to revolutionize applications like augmented and mixed realities by enabling more seamless interactions between digital content and the physical world. However, the scarcity of research specifically fitted to the intricacies of indoor environments has highlighted a clear gap in the literature. To address this, our study delves into the evaluation of existing datasets and computational models, leading to the creation of a refined dataset. This new dataset is derived from OpenImages v7, focusing exclusively on 32 indoor categories selected for their relevance to real-world applications. Alongside this, we present an adaptation of a CNN detection model, incorporating an attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Visual Attention and Saliency Detection
MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
