A Data-Driven Approach for Tag Refinement and Localization in Web Videos
Lamberto Ballan, Marco Bertini, Giuseppe Serra, Alberto Del Bimbo

TL;DR
This paper introduces a data-driven method for automatically refining and localizing tags in web videos by leveraging collective knowledge, visual similarity, and web sources, achieving state-of-the-art results.
Contribution
The proposed approach automatically refines and localizes video tags using a flexible, classifier-free method that exploits web and social media visual data.
Findings
Achieves state-of-the-art results on DUT-WEBV dataset.
Effectively increases and localizes tags in web videos.
Handles open vocabulary scenarios with few parameters.
Abstract
Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of media collections, e.g. using tag clouds, or to retrieve multimedia content. However, not all media are equally tagged by users. Using the current systems is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook. On the other hand, tagging a video sequence is more complicated and time consuming, so that users just tag the overall content of a video. In this paper we present a method for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to keyframes. Our approach exploits collective knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
