# Beyond Shallow Heuristics: Leveraging Human Intuition for Curriculum Learning

**Authors:** Vanessa Toborek, Sebastian M\"uller, Tim Selbach, Tam\'as Horv\'ath, Christian Bauckhage

arXiv: 2508.19873 · 2025-08-28

## TL;DR

This paper explores using human intuition about linguistic difficulty to improve curriculum learning in language model training, demonstrating that simple, human-curated data can enhance performance when properly structured.

## Contribution

It introduces a curriculum learning approach leveraging human intuition for linguistic difficulty, showing its effectiveness over heuristic-based methods in language model training.

## Key findings

- Human-curated simple language improves perplexity when used as a curriculum.
- Competence-based curricula do not consistently outperform random ordering.
- Human intuition can effectively guide curriculum learning for language models.

## Abstract

Curriculum learning (CL) aims to improve training by presenting data from "easy" to "hard", yet defining and measuring linguistic difficulty remains an open challenge. We investigate whether human-curated simple language can serve as an effective signal for CL. Using the article-level labels from the Simple Wikipedia corpus, we compare label-based curricula to competence-based strategies relying on shallow heuristics. Our experiments with a BERT-tiny model show that adding simple data alone yields no clear benefit. However, structuring it via a curriculum -- especially when introduced first -- consistently improves perplexity, particularly on simple language. In contrast, competence-based curricula lead to no consistent gains over random ordering, probably because they fail to effectively separate the two classes. Our results suggest that human intuition about linguistic difficulty can guide CL for language model pre-training.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.19873/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/2508.19873/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/2508.19873/full.md

---
Source: https://tomesphere.com/paper/2508.19873