Lightweight Multi-Frame Integration for Robust YOLO Object Detection in Videos
Yitong Quan, Benjamin Kiefer, Martin Messmer, Andreas Zell

TL;DR
This paper introduces a simple multi-frame stacking method for YOLO-based video object detection that enhances robustness and maintains real-time performance, especially benefiting lightweight models in challenging scenarios.
Contribution
It proposes a minimal modification approach to incorporate temporal context into YOLO detectors, improving detection robustness without increasing complexity.
Findings
Enhanced detection robustness in videos with motion blur and occlusions.
Significant performance gains for lightweight models.
Introduction of the BOAT360 dataset for real-world evaluation.
Abstract
Modern image-based object detection models, such as YOLOv7, primarily process individual frames independently, thus ignoring valuable temporal context naturally present in videos. Meanwhile, existing video-based detection methods often introduce complex temporal modules, significantly increasing model size and computational complexity. In practical applications such as surveillance and autonomous driving, transient challenges including motion blur, occlusions, and abrupt appearance changes can severely degrade single-frame detection performance. To address these issues, we propose a straightforward yet highly effective strategy: stacking multiple consecutive frames as input to a YOLO-based detector while supervising only the output corresponding to a single target frame. This approach leverages temporal information with minimal modifications to existing architectures, preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · Advanced Image and Video Retrieval Techniques · Image Processing Techniques and Applications
