Bandit Guided Submodular Curriculum for Adaptive Subset Selection
Prateek Chanda, Prayas Agrawal, Saral Sureka, Lokesh Reddy Polu, Atharv Kshirsagar, Ganesh Ramakrishnan

TL;DR
This paper introduces ONLINESUBMOD, a bandit-based approach for adaptive subset selection in curriculum learning, leveraging submodular functions to improve sample selection efficiency and accuracy in vision and language tasks.
Contribution
It formulates adaptive subset selection as a multi-armed bandit problem and proposes a novel online greedy policy with provable no-regret guarantees.
Findings
Outperforms traditional curriculum learning methods
Achieves better accuracy-efficiency tradeoffs
Effective across vision and language datasets
Abstract
Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
