Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models
Kangyang Luo, Zichen Ding, Zhenmin Weng, Lingfeng Qiao, Meng Zhao, Xiang Li, Di Yin, Jinlong Shu

TL;DR
This paper introduces LBS3, a curriculum learning-inspired prompting method for large language models that improves reasoning by progressively guiding models from easy to hard queries, reducing manual effort and enhancing performance.
Contribution
LBS3 is a novel prompt approach that mimics human learning by gradually increasing query difficulty, leading to better reasoning quality in LLMs without relying on external data.
Findings
LBS3 outperforms existing methods on reasoning tasks.
It reduces manual prompt engineering effort.
Achieves state-of-the-art results across various LLMs.
Abstract
While Chain of Thought (CoT) prompting approaches have significantly consolidated the reasoning capabilities of large language models (LLMs), they still face limitations that require extensive human effort or have performance needs to be improved. Existing endeavors have focused on bridging these gaps; however, these approaches either hinge on external data and cannot completely eliminate manual effort, or they fall short in effectively directing LLMs to generate high-quality exemplary prompts. To address the said pitfalls, we propose a novel prompt approach for automatic reasoning named \textbf{LBS3}, inspired by curriculum learning which better reflects human learning habits. Specifically, LBS3 initially steers LLMs to recall easy-to-hard proxy queries that are pertinent to the target query. Following this, it invokes a progressive strategy that utilizes exemplary prompts stemmed from…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The paper is well-written and easy to follow. 2. The performance of LBS3 is strong. 3. The idea of LBS3 is simple and effective.
1. Lack of theoretical contribution. Although the performance of LBS3 is quite promising, its technical contribution compared to existing methods (such as Ana-Pro) seems minor. 2. To enhance the paper's contribution, it would be advantageous to provide insights into how simpler exemplars can improve LLM's accuracy on more challenging exemplars.
* The prompting method outperforms many baselines for solving complex tasks for LLM, verified under different LLMs * The method makes sense intuitively.
* More ablation studies are needed to show the components proposed in this paper are necessary
- The idea of easy-to-hard prompting is natural, the solution is simple and elegant, which is well appreciated. - For the experiments, the choice of models, datasets, and baselines are all reasonably thorough and solid.
- I feel like the paper could have been a bit stronger on the analysis. The paper does ablations on the number of examples used in each of the two stages (Figure 3), examines using only easy or only hard examples (Figure 5), and compares against the baselines. It demonstrates the effectiveness of the method but it doesn't really tell me why the method is effective. For example, I would be interested in the type of errors that are made as result of the suboptimal ablations; can the reason for fai
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning
