Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning

Yuyang Deng; Samory Kpotufe

arXiv:2507.04194·stat.ML·July 8, 2025

Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning

Yuyang Deng, Samory Kpotufe

PDF

1 Video

TL;DR

This paper develops an adaptive SGD method for supervised transfer learning that balances source and target data sampling, ensuring statistical guarantees and convergence, even without prior knowledge of source quality.

Contribution

It introduces a novel mixed-sample SGD algorithm that adaptively combines source and target data, providing transfer guarantees and convergence analysis for convex prediction tasks.

Findings

01

Converges at a $1/\sqrt{T}$ rate for linear regression with square loss.

02

Achieves adaptive statistical performance based on source quality.

03

Supported by experiments on synthetic and real datasets.

Abstract

Theoretical works on supervised transfer learning (STL) -- where the learner has access to labeled samples from both source and target distributions -- have for the most part focused on statistical aspects of the problem, while efficient optimization has received less attention. We consider the problem of designing an SGD procedure for STL that alternates sampling between source and target data, while maintaining statistical transfer guarantees without prior knowledge of the quality of the source data. A main algorithmic difficulty is in understanding how to design such an adaptive sub-sampling mechanism at each SGD step, to automatically gain from the source when it is informative, or bias towards the target and avoid negative transfer when the source is less informative. We show that, such a mixed-sample SGD procedure is feasible for general prediction tasks with convex losses,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning· slideslive