TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models
Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng, Jiang, Haizhou Li

TL;DR
TS-Align introduces a scalable, automatic method for aligning large language models by leveraging teacher-student collaboration to mine feedback and iteratively improve model policies without extensive human data collection.
Contribution
The paper presents a novel teacher-student framework that automates feedback mining and iterative fine-tuning, reducing reliance on costly human preference data for LLM alignment.
Findings
Aligned policy outperforms base policy with 69.7% win rate
Effective distillation of teacher's ranking into student model
Framework enables scalable iterative alignment without human feedback
Abstract
Mainstream approaches to aligning large language models (LLMs) heavily rely on human preference data, particularly when models require periodic updates. The standard process for iterative alignment of LLMs involves collecting new human feedback for each update. However, the data collection process is costly and challenging to scale. To address this issue, we introduce the "TS-Align" framework, which fine-tunes a policy model using pairwise feedback data automatically mined from its outputs. This automatic mining process is efficiently accomplished through the collaboration between a large-scale teacher model and a small-scale student model. The policy fine-tuning process can be iteratively repeated using on-policy generations within our proposed teacher-student collaborative framework. Through extensive experiments, we demonstrate that our final aligned policy outperforms the base…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Natural Language Processing Techniques · Topic Modeling
MethodsBalanced Selection
