Dynamics of Instruction Fine-Tuning for Chinese Large Language Models

Chiyu Song; Zhanchao Zhou; Jianhao Yan; Yuejiao Fei; Zhenzhong Lan,; Yue Zhang

arXiv:2310.19651·cs.CL·March 4, 2025·1 cites

Dynamics of Instruction Fine-Tuning for Chinese Large Language Models

Chiyu Song, Zhanchao Zhou, Jianhao Yan, Yuejiao Fei, Zhenzhong Lan,, Yue Zhang

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper systematically studies how data quantity, model size, and data construction influence instruction tuning of Chinese LLMs, revealing ability-specific scaling behaviors and strategies for efficient training.

Contribution

It introduces a comprehensive analysis of instruction tuning effects on Chinese LLMs, highlighting ability-specific scaling sensitivities and tailored training strategies.

Findings

01

Some abilities are more responsive to scaling than others.

02

Scaling sensitivity is explained by Complexity and Transference features.

03

Tailored training strategies improve performance on benchmarks.

Abstract

Instruction tuning is a burgeoning method to elicit the general intelligence of Large Language Models (LLMs). While numerous studies have examined the impact of factors such as data volume and model size on English models, the scaling properties of instruction tuning in other languages remain largely unexplored. In this work, we systematically investigate the effects of data quantity, model size, and data construction methods on instruction tuning for Chinese LLMs. We utilize a newly curated dataset, DoIT, which includes over 40,000 high-quality instruction instances covering ten underlying abilities, such as creative writing, code generation, and logical reasoning. Our experiments, conducted on models ranging from 7b to 33b parameters, yield three key findings: (i) While these factors directly affect overall model performance, some abilities are more responsive to scaling, whereas…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chiyusong/dynamics-of-instruction-tuning
pytorchOfficial

Datasets

ChiyuSONG/dynamics-of-instruction-tuning
dataset· 282 dl
282 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Residual Connection · Byte Pair Encoding · Dense Connections · Layer Normalization · Label Smoothing