STF: Spatio-Temporal Fusion Module for Improving Video Object Detection

Noreen Anwar; Guillaume-Alexandre Bilodeau; Wassim Bouachir

arXiv:2402.10752·cs.CV·February 19, 2024·2 cites

STF: Spatio-Temporal Fusion Module for Improving Video Object Detection

Noreen Anwar, Guillaume-Alexandre Bilodeau, Wassim Bouachir

PDF

Open Access 1 Repo

TL;DR

This paper introduces a spatio-temporal fusion module that leverages information from consecutive video frames to enhance object detection accuracy, utilizing attention mechanisms and learnable feature merging.

Contribution

The novel STF framework combines multi-frame attention and dual-frame fusion modules to improve video object detection performance.

Findings

01

Improved detection accuracy on three benchmark datasets.

02

Effective use of attention modules for feature sharing.

03

Learnable fusion enhances feature robustness.

Abstract

Consecutive frames in a video contain redundancy, but they may also contain relevant complementary information for the detection task. The objective of our work is to leverage this complementary information to improve detection. Therefore, we propose a spatio-temporal fusion framework (STF). We first introduce multi-frame and single-frame attention modules that allow a neural network to share feature maps between nearby frames to obtain more robust object representations. Second, we introduce a dual-frame fusion module that merges feature maps in a learnable manner to improve them. Our evaluation is conducted on three different benchmarks including video sequences of moving road users. The performed experiments demonstrate that the proposed spatio-temporal fusion module leads to improved detection performance compared to baseline object detectors. Code is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

noreenanwar/stf-module
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Infrared Target Detection Methodologies · Video Surveillance and Tracking Methods