Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training

Mingliang Liang; Zhuoran Liu; Arjen P. de Vries; Martha Larson

arXiv:2604.27932·cs.CV·May 1, 2026

Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training

Mingliang Liang, Zhuoran Liu, Arjen P. de Vries, Martha Larson

PDF

1 Models

TL;DR

This paper introduces DynamiCS, a dynamic cluster-based sampling method that reduces training costs and improves long-tail concept representation in vision-language models by adjusting data sampling at each epoch.

Contribution

The paper proposes a novel dynamic sampling approach that emphasizes long-tail concepts, contrasting with existing methods that flatten data distribution.

Findings

01

DynamiCS reduces computational cost of VLM training.

02

It improves representation of long-tail concepts.

03

Dynamic sampling outperforms static approaches.

Abstract

The computational cost of training a vision-language model (VLM) can be reduced by sampling the training data. Previous work on efficient VLM pre-training has pointed to the importance of semantic data balance, adjusting the distribution of topics in the data to improve VLM accuracy. However, existing efficient pre-training approaches may disproportionately remove rare concepts from the training corpus. As a result, \emph{long-tail concepts} remain insufficiently represented in the training data and are not effectively captured during training. In this work, we introduce a \emph{dynamic cluster-based sampling approach (DynamiCS)} that downsamples large clusters of data and upsamples small ones. The approach is dynamic in that it applies sampling at each epoch. We first show the importance of dynamic sampling for VLM training. Then, we demonstrate the advantage of our cluster-scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
MingliangLiang3/DynamiCS-ViT-B-16-DataComp-DFN
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.