Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?
Xuan He, Da Yin, Nanyun Peng

TL;DR
This paper investigates how weak supervision from various quality levels, including high-error hard tasks and easier subtasks, can effectively train large language models for complex math reasoning, revealing counterintuitive insights and new data strategies.
Contribution
It introduces empirical strategies for supervision from weak and easy sources, demonstrating their effectiveness over perfect supervision in certain hard reasoning tasks.
Findings
Training on high-error hard tasks can outperform perfect easy task supervision.
Step-wise error rates critically impact training effectiveness.
Combining hard task supervision with subtasks improves performance more than rephrasing.
Abstract
How can "weak teacher models" such as average human annotators or existing AI systems, effectively supervise LLMs to improve performance on hard reasoning tasks, especially those that challenge and requires expertise or daily practice from the teacher models? In this paper, we seek for empirical answers to this question by investigating various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity. Two intuitive strategies emerge for teacher models to provide supervision during alignment training: 1) using lower-quality supervision from complete tasks that match the difficulty of the target reasoning tasks, and 2) leveraging higher-quality supervision from easier subtasks that are less challenging. Interestingly, we find that even when the outcome error rate for hard task supervision is high (e.g., 90\%), training on such data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational and Psychological Assessments
