Sample Complexity of Transfer Learning: An Optimal Transport Approach
Haoyang Cao, Xin Guo, Wenpin Tang, Guan Wang

TL;DR
This paper provides a theoretical analysis of transfer learning's sample efficiency using optimal transport, showing it can outperform direct learning especially in high-dimensional, complex models, supported by numerical experiments.
Contribution
It introduces a rigorous optimal transport-based framework to analyze transfer learning's sample complexity, revealing conditions where transfer learning is more efficient.
Findings
Transfer learning has better sample complexity when data dimension exceeds 3.
The sample complexity for transfer learning scales as O(m^{-(α+1)/d}), indicating improved efficiency.
Numerical experiments on image classification demonstrate significant performance gains in data-scarce regimes.
Abstract
Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension is higher than , the sample complexity for transfer learning is , with indicating the smoothness of the data distribution, as opposed to the sample complexity for direct learning with indicating the smoothness of the optimal target model. Our finding theoretically supports a better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
