Packing Analysis: Packing Is More Appropriate for Large Models or   Datasets in Supervised Fine-tuning

Shuhe Wang; Guoyin Wang; Yizhong Wang; Jiwei Li; Eduard Hovy; Chen Guo

arXiv:2410.08081·cs.LG·November 7, 2024

Packing Analysis: Packing Is More Appropriate for Large Models or Datasets in Supervised Fine-tuning

Shuhe Wang, Guoyin Wang, Yizhong Wang, Jiwei Li, Eduard Hovy, Chen Guo

PDF

Open Access 1 Repo

TL;DR

This paper provides a comprehensive analysis of packing versus padding in supervised fine-tuning, evaluating efficiency, performance, and practical considerations across various models and datasets.

Contribution

It offers the first extensive comparison of packing and padding in SFT, including benchmarks, practical insights, and open-source tools for future research.

Findings

01

Packing is more suitable for large models and datasets in SFT.

02

Packing can improve training efficiency without sacrificing performance.

03

The effectiveness of packing depends on model size and dataset characteristics.

Abstract

Packing, initially utilized in the pre-training phase, is an optimization technique designed to maximize hardware resource efficiency by combining different training sequences to fit the model's maximum input length. Although it has demonstrated effectiveness during pre-training, there remains a lack of comprehensive analysis for the supervised fine-tuning (SFT) stage on the following points: (1) whether packing can effectively enhance training efficiency while maintaining performance, (2) the suitable size of the model and dataset for fine-tuning with the packing method, and (3) whether packing unrelated or related training samples might cause the model to either excessively disregard or over-rely on the context. In this paper, we perform extensive comparisons between SFT methods using padding and packing, covering SFT datasets ranging from 69K to 1.2M and models from 8B to 70B. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shuhewang1998/packing-analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Packing Problems · VLSI and FPGA Design Techniques

MethodsShrink and Fine-Tune