BoxMask: Revisiting Bounding Box Supervision for Video Object Detection

Khurram Azeem Hashmi; Alain Pagani; Didier Stricker; Muhammamd Zeshan; Afzal

arXiv:2210.06008·cs.CV·October 13, 2022

BoxMask: Revisiting Bounding Box Supervision for Video Object Detection

Khurram Azeem Hashmi, Alain Pagani, Didier Stricker, Muhammamd Zeshan, Afzal

PDF

Open Access 1 Video

TL;DR

BoxMask introduces a class-aware pixel-level supervision method for video object detection, significantly improving accuracy by refining object representations beyond traditional instance-level features.

Contribution

The paper proposes BoxMask, a novel approach that leverages bounding box annotations as coarse masks to enhance pixel-level discriminative features in video object detection.

Findings

01

Consistent improvement across ImageNet VID and EPIC KITCHENS datasets.

02

Effective integration into various state-of-the-art detectors.

03

Significant boost in detection accuracy with minimal additional complexity.

Abstract

We present a new, simple yet effective approach to uplift video object detection. We observe that prior works operate on instance-level feature aggregation that imminently neglects the refined pixel-level representation, resulting in confusion among objects sharing similar appearance or motion characteristics. To address this limitation, we propose BoxMask, which effectively learns discriminative representations by incorporating class-aware pixel-level information. We simply consider bounding box-level annotations as a coarse mask for each object to supervise our method. The proposed module can be effortlessly integrated into any region-based detector to boost detection. Extensive experiments on ImageNet VID and EPIC KITCHENS datasets demonstrate consistent and significant improvement when we plug our BoxMask module into numerous recent state-of-the-art methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BoxMask: Revisiting Bounding Box Supervision for Video Object Detection· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning