K-LITE: Learning Transferable Visual Models with External Knowledge
Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie,, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor, Darrell, Anna Rohrbach, and Jianfeng Gao

TL;DR
K-LITE introduces a method to incorporate external structured knowledge from sources like WordNet and Wiktionary into visual models, enhancing their transferability and zero-shot learning capabilities in image classification and object detection.
Contribution
It proposes a scalable approach to enrich visual training data with external knowledge, improving transfer learning performance over existing methods.
Findings
Significant improvement in transfer learning accuracy.
Enhanced zero-shot and few-shot learning capabilities.
Effective use of external knowledge sources in vision models.
Abstract
The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, due to the broad concept coverage achieved via large-scale data collection process. Alternatively, we argue that learning with external knowledge is a promising way which leverages a much more structured source of supervision and offers sample efficiency. We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts. In evaluation, the text is also augmented with external…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
