FRAME: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy

Xuemiao Zhang; Feiyu Duan; Liangyu Xu; Yongwei Zhou; Sirui Wang; Rongxiang Weng; Jingang Wang; Xunliang Cai

arXiv:2502.05551·cs.CL·June 3, 2025

FRAME: Boosting LLMs with A Four-Quadrant Multi-Stage Pretraining Strategy

Xuemiao Zhang, Feiyu Duan, Liangyu Xu, Yongwei Zhou, Sirui Wang, Rongxiang Weng, Jingang Wang, Xunliang Cai

PDF

Open Access

TL;DR

This paper introduces FRAME, a four-stage multi-quadrant pretraining strategy for large language models that significantly improves performance by organizing data based on perplexity and difference, achieving notable gains over random data partitioning.

Contribution

The paper presents a novel four-quadrant multi-stage pretraining method guided by quantitative criteria, enhancing LLM training efficiency and performance over prior heuristic approaches.

Findings

01

Achieves 16.8% average improvement over random data partitioning.

02

Organizing data by perplexity and difference causes significant loss reductions.

03

Four-stage pretraining strategy effectively boosts LLM performance.

Abstract

Large language models (LLMs) have significantly advanced human language understanding and generation, with pretraining data quality and organization being crucial to their performance. Multi-stage pretraining is a promising approach, but existing methods often lack quantitative criteria for data partitioning and instead rely on intuitive heuristics. In this paper, we propose the novel Four-quadRAnt Multi-stage prEtraining strategy (FRAME), guided by the established principle of organizing the pretraining process into four stages to achieve significant loss reductions four times. This principle is grounded in two key findings: first, training on high Perplexity (PPL) data followed by low PPL data, and second, training on low PPL difference (PD) data followed by high PD data, both causing the loss to drop significantly twice and performance enhancements. By partitioning data into four…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques