Efficient video annotation with visual interpolation and frame selection guidance
A. Kuznetsova, A. Talati, Y. Luo, K. Simmons, V. Ferrari

TL;DR
This paper presents a unified framework for efficient video annotation that combines automatic frame selection and bounding box interpolation, significantly reducing manual effort and annotation time.
Contribution
It introduces a model capable of both interpolating and extrapolating bounding boxes, along with a guiding mechanism for selecting frames to annotate, advancing video annotation efficiency.
Findings
Reduces manual bounding box annotations by 60% compared to linear interpolation.
Achieves 10% faster annotation time than existing state-of-the-art methods.
Human studies show a 50% reduction in actual annotation time.
Abstract
We introduce a unified framework for generic video annotation with bounding boxes. Video annotation is a longstanding problem, as it is a tedious and time-consuming process. We tackle two important challenges of video annotation: (1) automatic temporal interpolation and extrapolation of bounding boxes provided by a human annotator on a subset of all frames, and (2) automatic selection of frames to annotate manually. Our contribution is two-fold: first, we propose a model that has both interpolating and extrapolating capabilities; second, we propose a guiding mechanism that sequentially generates suggestions for what frame to annotate next, based on the annotations made previously. We extensively evaluate our approach on several challenging datasets in simulation and demonstrate a reduction in terms of the number of manual bounding boxes drawn by 60% over linear interpolation and by 35%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Advanced Vision and Imaging
