Learning to Solve Complex Problems via Dataset Decomposition
Wanru Zhao, Lucas Caccia, Zhengyan Shi, Minseon Kim, Weijia Xu, Alessandro Sordoni

TL;DR
This paper introduces a reverse curriculum learning method that decomposes complex datasets into simpler parts using a teacher-student framework, improving model performance on math and code tasks.
Contribution
It presents a novel recursive dataset decomposition technique with a reasoning-enabled teacher to generate curricula, enhancing learning efficiency for complex problems.
Findings
Models trained with the proposed curriculum outperform standard training methods.
The approach effectively decomposes complex data into simpler components.
Experimental results show improved accuracy on math and code datasets.
Abstract
Curriculum learning is a class of training strategies that organizes the data being exposed to a model by difficulty, gradually from simpler to more complex examples. This research explores a reverse curriculum generation approach that recursively decomposes complex datasets into simpler, more learnable components. We propose a teacher-student framework where the teacher is equipped with the ability to reason step-by-step, which is used to recursively generate easier versions of examples, enabling the student model to progressively master difficult tasks. We propose a novel scoring system to measure data difficulty based on its structural complexity and conceptual depth, allowing curriculum construction over decomposed data. Experiments on math datasets (MATH and AIME) and code generation datasets demonstrate that models trained with curricula generated by our approach exhibit superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Statistics Education and Methodologies · Topic Modeling
