Revisiting Document Representations for Large-Scale Zero-Shot Learning
Jihyung Kil, Wei-Lun Chao

TL;DR
This paper explores using Wikipedia documents as semantic representations for large-scale zero-shot learning, proposing a semi-automatic method to extract visual sentences that significantly improve recognition performance.
Contribution
It introduces a novel semi-automatic visual sentence extraction method from documents, reducing human effort and enhancing zero-shot learning accuracy on large datasets.
Findings
Achieved 64% relative improvement on ImageNet with over 10,000 unseen classes.
Demonstrated effectiveness of document-based semantic representations over traditional visual attributes.
Proposed a weighting scheme to distinguish similar classes in semantic representations.
Abstract
Zero-shot learning aims to recognize unseen objects using their semantic representations. Most existing works use visual attributes labeled by humans, not suitable for large-scale applications. In this paper, we revisit the use of documents as semantic representations. We argue that documents like Wikipedia pages contain rich visual information, which however can easily be buried by the vast amount of non-visual sentences. To address this issue, we propose a semi-automatic mechanism for visual sentence extraction that leverages the document section headers and the clustering structure of visual sentences. The extracted visual sentences, after a novel weighting scheme to distinguish similar classes, essentially form semantic representations like visual attributes but need much less human effort. On the ImageNet dataset with over 10,000 unseen classes, our representations lead to a 64%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Viral Infections and Outbreaks Research
