Super-model ecosystem: A domain-adaptation perspective
Fengxiang He, Dacheng Tao

TL;DR
This paper provides a theoretical foundation for the super-model paradigm in AI, modeling its training process as a two-stage diffusion and establishing bounds on generalization error related to domain discrepancy.
Contribution
It introduces a mathematical model of super-model training as a two-stage diffusion process and derives a PAC-Bayesian generalization bound for domain adaptation.
Findings
The training process can be modeled by Uhlenbeck-Ornstein diffusion converging to Maxwell-Boltzmann distributions.
The generalization error during fine-tuning is the dominant factor in domain adaptation.
Generalization is influenced by a new measure of domain discrepancy based on covariance matrices and local minima shift.
Abstract
This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation, where one first trains a very large-scale model, {\it i.e.}, super model (or foundation model in some other papers), on a large amount of data and then adapts it to various specific domains. Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry, especially enormous small and medium-sized enterprises. We model the super-model paradigm as a two-stage diffusion process: (1) in the pre-training stage, the model parameter diffuses from random initials and converges to a steady distribution; and (2) in the fine-tuning stage, the model parameter is transported to another steady distribution. Both training stages can be mathematically modeled by the Uhlenbeck-Ornstein process which converges to two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsDiffusion
