Unified Object Detector for Different Modalities based on Vision   Transformers

Xiaoke Shen; Ioannis Stamos

arXiv:2207.01071·cs.CV·May 9, 2023

Unified Object Detector for Different Modalities based on Vision Transformers

Xiaoke Shen, Ioannis Stamos

PDF

Open Access 1 Repo

TL;DR

This paper presents a unified object detection model based on vision transformers that seamlessly switches between RGB and depth modalities without retraining, demonstrating superior performance across diverse conditions.

Contribution

The paper introduces a novel unified detection framework combining cross/inter-modality transfer learning with vision transformers, enabling modality switching without model updates.

Findings

01

Achieves comparable or better performance than state-of-the-art on SUN RGB-D dataset.

02

Introduces a novel inter-modality mixing method for improved results.

03

Demonstrates effective modality switching in robotics scenarios.

Abstract

Traditional systems typically require different models for processing different modalities, such as one model for RGB images and another for depth images. Recent research has demonstrated that a single model for one modality can be adapted for another using cross-modality transfer learning. In this paper, we extend this approach by combining cross/inter-modality transfer learning with a vision transformer to develop a unified detector that achieves superior performance across diverse modalities. Our research envisions an application scenario for robotics, where the unified system seamlessly switches between RGB cameras and depth sensors in varying lighting conditions. Importantly, the system requires no model architecture or weight updates to enable this smooth transition. Specifically, the system uses the depth sensor during low-lighting conditions (night time) and both the RGB camera…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liketheflower/uoddm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Industrial Vision Systems and Defect Detection

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Softmax · Multi-Head Attention · Residual Connection · Vision Transformer