InsBank: Evolving Instruction Subset for Ongoing Alignment
Jiayi Shi, Yiwei Li, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Huan Ren, Yao Hu, Kan Li

TL;DR
InsBank introduces a continuously evolving instruction data repository for large language models, employing a novel framework PIBE that enhances data selection efficiency and diversity to improve ongoing model alignment.
Contribution
The paper proposes PIBE, a new framework for evolving InsBank effectively, combining diversity and quality scores for better instruction data selection over time.
Findings
PIBE outperforms baselines in InsBank evolution
Effectively extracts budget-specific instruction subsets
Enhances ongoing LLM alignment
Abstract
Large language models (LLMs) typically undergo instruction tuning to enhance alignment. Recent studies emphasize that quality and diversity of instruction data are more crucial than quantity, highlighting the need to select diverse, high-quality subsets to reduce training costs. However, how to evolve these selected subsets alongside the development of new instruction data remains insufficiently explored. To achieve LLMs' ongoing alignment, we introduce Instruction Bank (\textbf{InsBank}), a continuously updated repository that integrates the latest valuable instruction data. We further propose Progressive Instruction Bank Evolution (\textbf{PIBE}), a novel framework designed to evolve InsBank effectively and efficiently over time. PIBE employs a gradual data selection strategy to maintain long-term efficiency, leveraging a representation-based diversity score to capture relationships…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
Innovation in Data Management: The concept of InsBank and the PIBE framework addresses a critical need for efficient, ongoing alignment of LLMs with evolving instruction data. Efficiency and Scalability: By retaining only necessary data and historical information, PIBE reduces computational and storage costs, making it suitable for large-scale applications. Comprehensive Diversity Evaluation: The representation-based diversity score effectively captures relationships between data points, impro
Lack of novelty: While the paper presents the InsBank concept and the PIBE framework, the methods employed largely combine existing techniques without substantial innovation. The use of Affinity Propagation for diversity scoring and simple mathematical operations (addition and multiplication) to combine diversity and quality scores are straightforward applications of known methods. Clarity in Methodology: need more detailed explanations of the experiments to enable result reproducibility. Clari
1. The author consider an insteresting setting of contunually integrate instruction data selection for LLMs. 2. The prosposed method achieves a good performance on AlphacaEval and MT-Bench benchmarks.
1. The downstream evaluation benchmarks are limited. It would be better if the author conduct more downstream analysis on more benchmarks such as MMLU etc. to showcase the advantage of proposed method.
* The developed method demonstrates superior performance over the considered baseline. * The idea of using affinity propagation for diversity measuring is interesting
* This paper has weaknesses in problem formulation, contribution, presentation, and experimental design. Please see the summary for details.
1.The introduction of InsBank and the PIBE framework brings a novel solution to the ongoing alignment and evolution of instruction data for LLMs. I think it's a relatively comprehensive and novel framework 2.The adaptation of Affinity Propagation for diversity scoring is well-suited for this progressive approach, enhancing the robustness and representation quality of selected subsets. 3.The authors flexibly integrated quality and diversity scores, allowing PIBE to adapt to various budget constra
1.The authors focus primarily on widely used datasets. I think it would be possible to evaluate the performance of PIBE on more domain-specific datasets or to evaluate its performance with multiple evaluation methods. 2.The ensemble weights for mass and diversity are not well analyzed, which can lead to issues with the sensitivity of PIBE performance to changes in these parameters.
Code & Models
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Video Analysis and Summarization · Human Motion and Animation
