Image and Video Mining through Online Learning
Andrew Gilbert, Richard Bowden

TL;DR
This paper introduces an online learning method for image and video recognition that iteratively clusters media by leveraging discriminative signatures, reducing labeling effort and improving accuracy across diverse datasets.
Contribution
It presents a novel online clustering approach using image signatures that express co-occurrence and symbol frequency, enabling scalable and efficient media recognition with minimal labeled data.
Findings
Achieved 86.7% accuracy on UCF11 video dataset with only 90 labeled videos.
Method is scalable, processing 1200 videos in about 1 minute.
Effective across multiple media types and datasets.
Abstract
Within the field of image and video recognition, the traditional approach is a dataset split into fixed training and test partitions. However, the labelling of the training set is time-consuming, especially as datasets grow in size and complexity. Furthermore, this approach is not applicable to the home user, who wants to intuitively group their media without tirelessly labelling the content. Our interactive approach is able to iteratively cluster classes of images and video. Our approach is based around the concept of an image signature which, unlike a standard bag of words model, can express co-occurrence statistics as well as symbol frequency. We efficiently compute metric distances between signatures despite their inherent high dimensionality and provide discriminative feature selection, to allow common and distinctive elements to be identified from a small set of user labelled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
