Starbucks-v2: Improved Training for 2D Matryoshka Embeddings
Shengyao Zhuang, Shuai Wang, Fabio Zheng, Bevan Koopman, Guido Zuccon

TL;DR
Starbucks introduces a novel training strategy for 2D Matryoshka embedding models, combining structured fine-tuning and masked autoencoder pre-training to improve sub-network performance and adaptability across tasks.
Contribution
The paper proposes Starbucks, a new training method that enhances 2D Matryoshka embeddings by integrating structured fine-tuning with MAE pre-training, achieving performance comparable to individually trained models.
Findings
Starbucks outperforms baseline 2D Matryoshka models on multiple benchmarks.
Pre-training with MAE improves sub-network representation quality.
Hybrid models combining depth- and width-wise variants yield additional gains.
Abstract
2D Matryoshka training enables a single embedding model to generate sub-network representations across different layers and embedding dimensions, offering adaptability to diverse computational and task constraints. However, its effectiveness remains well below that of individually trained models of equivalent sizes. To address this, we propose Starbucks, a new training strategy for Matryoshka-style embedding models that combines structured fine-tuning with masked autoencoder (MAE) pre-training. During fine-tuning, we compute the loss over a fixed set of layer-dimension pairs, from small to large, which significantly improves performance over randomly sampled sub-networks and matches that of separately trained models. Our MAE-based pre-training further enhances the representation quality of sub-networks, providing a stronger backbone for downstream tasks. Experiments on both in-domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition
