Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models
Jiabo Huang, Chen Chen, Lingjuan Lyu

TL;DR
This paper introduces a novel model-driven approach to develop versatile vision foundation models by unifying multiple pre-trained models in a shared space, enabling knowledge inheritance without large-scale data or high-end GPUs.
Contribution
The authors propose a joint knowledge transfer and preservation method that unifies multiple pre-trained models, creating a general-purpose VFM capable of multiple vision tasks without extensive data training.
Findings
Outperforms existing data-centric models on four vision tasks.
Effectively integrates knowledge from diverse pre-trained models.
Supports multiple downstream vision applications.
Abstract
Vision foundation models (VFMs) are predominantly developed using data-centric methods. These methods require training on vast amounts of data usually with high-quality labels, which poses a bottleneck for most institutions that lack both large-scale data and high-end GPUs. On the other hand, many open-source vision models have been pretrained on domain-specific data, enabling them to distill and represent core knowledge in a form that is transferable across diverse applications. Even though these models are highly valuable assets, they remain largely under-explored in empowering the development of a general-purpose VFM. In this paper, we present a new model-driven approach for training VFMs through joint knowledge transfer and preservation. Our method unifies multiple pre-trained teacher models in a shared latent space to mitigate the ``imbalanced transfer'' issue caused by their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
