Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to Better Classify Objects in Videos
Sukjun Hwang, Miran Heo, Seoung Wug Oh, Seon Joo Kim

TL;DR
This paper introduces a set classifier that aggregates multiple viewpoints within tracklets to improve object classification in videos, especially under long-tailed distributions, achieving state-of-the-art results.
Contribution
The paper proposes a novel set classifier and tracklet augmentation method that enhance long-tailed object tracking performance by leveraging multiple viewpoints and handling sparse annotations.
Findings
Achieved new state-of-the-art TrackAP_50 scores on TAO dataset.
Significantly improved classification accuracy in long-tailed video object tracking.
Demonstrated plug-and-play compatibility with existing trackers.
Abstract
Recently, both long-tailed recognition and object tracking have made great advances individually. TAO benchmark presented a mixture of the two, long-tailed object tracking, in order to further reflect the aspect of the real-world. To date, existing solutions have adopted detectors showing robustness in long-tailed distributions, which derive per-frame results. Then, they used tracking algorithms that combine the temporally independent detections to finalize tracklets. However, as the approaches did not take temporal changes in scenes into account, inconsistent classification results in videos led to low overall performance. In this paper, we present a set classifier that improves accuracy of classifying tracklets by aggregating information from multiple viewpoints contained in a tracklet. To cope with sparse annotations in videos, we further propose augmentation of tracklets that can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · IoT-based Smart Home Systems
