Ensembling object detectors for image and video data analysis
Kateryna Chumachenko, Jenni Raitoharju, Alexandros Iosifidis, Moncef, Gabbouj

TL;DR
This paper introduces an ensembling method for object detectors that enhances detection accuracy and bounding box precision in images and videos, with applications in annotation and tracking.
Contribution
It presents a novel ensembling approach for image and video object detection, including a tracking-based refinement scheme for videos.
Findings
Improved detection performance through ensembling.
Enhanced bounding box precision in images and videos.
Effective as a standalone detection or annotation framework.
Abstract
In this paper, we propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data. We further extend it to video data by proposing a two-stage tracking-based scheme for detection refinement. The proposed method can be used as a standalone approach for improving object detection performance, or as a part of a framework for faster bounding box annotation in unseen datasets, assuming that the objects of interest are those present in some common public datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
