Feature sampling and partitioning for visual vocabulary generation on large action classification datasets
Michael Sapienza, Fabio Cuzzolin, Philip H.S. Torr

TL;DR
This paper evaluates feature sampling and partitioning strategies for visual vocabulary creation in large-scale action recognition, demonstrating that effective sampling significantly improves classification performance.
Contribution
It provides a critical assessment of vocabulary construction methods and introduces strategies that enhance performance with smaller vocabularies.
Findings
Strategic feature subsampling improves accuracy.
Partitioning features enhances vocabulary quality.
Achieved state-of-the-art results on five datasets.
Abstract
The recent trend in action recognition is towards larger datasets, an increasing number of action classes and larger visual vocabularies. State-of-the-art human action classification in challenging video data is currently based on a bag-of-visual-words pipeline in which space-time features are aggregated globally to form a histogram. The strategies chosen to sample features and construct a visual vocabulary are critical to performance, in fact often dominating performance. In this work we provide a critical evaluation of various approaches to building a vocabulary and show that good practises do have a significant impact. By subsampling and partitioning features strategically, we are able to achieve state-of-the-art results on 5 major action recognition datasets using relatively small visual vocabularies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
