Erasure-based Interaction Network for RGBT Video Object Detection and A Unified Benchmark
Zhengzheng Tu, Qishun Wang, Hongshun Wang, Kunpeng Wang, Chenglong Li

TL;DR
This paper introduces a new RGBT video object detection task, proposes the EINet model utilizing thermal images for noise reduction, and establishes a comprehensive benchmark dataset to evaluate efficiency and accuracy.
Contribution
The work presents a novel Erasure-based Interaction Network (EINet) for RGBT VOD and creates the VT-VOD50 dataset, advancing research in thermal and RGB video object detection.
Findings
EINet outperforms existing VOD methods in accuracy and efficiency.
Thermal images help reduce noise and improve detection in adverse conditions.
Small temporal window suffices for effective spatio-temporal modeling.
Abstract
Recently, many breakthroughs are made in the field of Video Object Detection (VOD), but the performance is still limited due to the imaging limitations of RGB sensors in adverse illumination conditions. To alleviate this issue, this work introduces a new computer vision task called RGB-thermal (RGBT) VOD by introducing the thermal modality that is insensitive to adverse illumination conditions. To promote the research and development of RGBT VOD, we design a novel Erasure-based Interaction Network (EINet) and establish a comprehensive benchmark dataset (VT-VOD50) for this task. Traditional VOD methods often leverage temporal information by using many auxiliary frames, and thus have large computational burden. Considering that thermal images exhibit less noise than RGB ones, we develop a negative activation function that is used to erase the noise of RGB features with the help of thermal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection
