Image classification by visual bag-of-words refinement and reduction
Zhiwu Lu, Liwei Wang, Ji-Rong Wen

TL;DR
This paper introduces a new framework that refines and reduces the visual bag-of-words model for image classification by leveraging social image tags and semantic spectral clustering, improving efficiency and semantic relevance.
Contribution
The paper proposes a graph-based BOW refinement method using image tags and a semantic spectral clustering approach for BOW reduction, addressing semantic gap and efficiency issues.
Findings
Improved image classification performance on social datasets.
Effective reduction of visual vocabulary size without sacrificing accuracy.
Enhanced semantic relevance of visual words.
Abstract
This paper presents a new framework for visual bag-of-words (BOW) refinement and reduction to overcome the drawbacks associated with the visual BOW model which has been widely used for image classification. Although very influential in the literature, the traditional visual BOW model has two distinct drawbacks. Firstly, for efficiency purposes, the visual vocabulary is commonly constructed by directly clustering the low-level visual feature vectors extracted from local keypoints, without considering the high-level semantics of images. That is, the visual BOW model still suffers from the semantic gap, and thus may lead to significant performance degradation in more challenging tasks (e.g. social image classification). Secondly, typically thousands of visual words are generated to obtain better performance on a relatively large image dataset. Due to such large vocabulary size, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
