TL;DR
CycleQD is a novel cyclic quality diversity approach that improves large language model skill acquisition by focusing on individual tasks, outperforming traditional fine-tuning and matching larger models' performance.
Contribution
Introduces CycleQD, a cyclic quality diversity method with model merging and SVD mutation, enhancing task-specific skill learning in large language models without data ratio tuning.
Findings
CycleQD surpasses traditional fine-tuning in coding, OS, and database tasks.
CycleQD achieves GPT-3.5-TURBO level performance with fewer parameters.
Method is applicable to image segmentation, demonstrating cross-domain versatility.
Abstract
Training large language models to acquire specific skills remains a challenging endeavor. Conventional training approaches often struggle with data distribution imbalances and inadequacies in objective functions that do not align well with task-specific performance. To address these challenges, we introduce CycleQD, a novel approach that leverages the Quality Diversity framework through a cyclic adaptation of the algorithm, along with a model merging based crossover and an SVD-based mutation. In CycleQD, each task's performance metric is alternated as the quality measure while the others serve as the behavioral characteristics. This cyclic focus on individual tasks allows for concentrated effort on one task at a time, eliminating the need for data ratio tuning and simplifying the design of the objective function. Empirical results from AgentBench indicate that applying CycleQD to…
Peer Reviews
Decision·ICLR 2025 Poster
The paper presents an innovative approach to LLM skill acquisition, uniquely applying the Quality Diversity paradigm to cycle through task-specific optimizations, which is quite novel. The experimental results on the proposed CycleQD framework show a substantial performance gain, validating the model’s effectiveness and overall performance. Overall, CycleQD introduces a scalable method to merge agent skills into LLMs, addressing critical challenges in agent-based LLM design.
Although the CycleQD is designed for the agent skill acquirment of LLMs, the experiments predominantly focus on computer science tasks, such as OS and DB. Its applicability to other fields as agents will futher strength this paper. The methodology, though effective, involves a multi-step process (crossover, mutation, cyclic quality alternation) that may be complex. The author may need to provide a clearer illustration of the methodology, such as an overview figure of the method pipeline. The
This work appears to be very original: (1) it is different from conventional fine-tuning and even model merging methods, and (2) it alleviates certain design decisions, like data mixing ratios and different objectives, when fine-tuning on multiple tasks. I understand that previous evolutionary approaches don't update the weights directly and this method proposes to do so (using the framing of model merging). I think this is a significant contribution. I think the specific adaptations they propo
I think the writing is mostly clear, but this paper assumes some extra background knowledge about evolutionary methods. I would really appreciate it if the authors (briefly) introduce such approaches and terminology from first principles, before the details of their augmentations. It would also be great if the authors could again motivate the need for evolutionary approaches (rather than fine-tuning) in training or merging models. Is it just the avoidance of design decisions that I mentioned ab
- The paper discussed a critical problem: achieving superior performance on multiple skills while retaining language capabilities. - The proposed method is novel.
Overall, the major weakness of this paper is its presentation. For the audience not familiar with evolutionary computation, the paper is hard to follow. 1. [Presentation] Problems in understanding the proposed method. - Explain the evolution of figure 4 by referring to algorithm 1. The figures are meant to give direct intuitions about how the algorithm works, but the current delivery fails to achieve this purpose. - In the elite sampling (line 217), what does the formula intuitively means? Sp
Code & Models
- 🤗SakanaAI/Llama-3-8B-Instruct-OS-Expertmodel· 10 dl· ♡ 110 dl♡ 1
- 🤗SakanaAI/Llama-3-8B-Instruct-DB-Expertmodel· 10 dl10 dl
- 🤗SakanaAI/Llama-3-8B-Instruct-Coding-Expertmodel· 12 dl· ♡ 1312 dl♡ 13
- 🤗SakanaAI/Llama-3-8B-Instruct-CycleQD-CSmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗RichardErkhov/SakanaAI_-_Llama-3-8B-Instruct-CycleQD-CS-8bitsmodel· 1 dl1 dl
- 🤗RichardErkhov/SakanaAI_-_Llama-3-8B-Instruct-CycleQD-CS-awqmodel· 1 dl1 dl
- 🤗RichardErkhov/SakanaAI_-_Llama-3-8B-Instruct-Coding-Expert-8bitsmodel
- 🤗RichardErkhov/SakanaAI_-_Llama-3-8B-Instruct-Coding-Expert-awqmodel· 1 dl1 dl
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Attention Dropout · Softmax · Multi-Head Attention · {Dispute@FaQ-s}How to file a dispute with Expedia? · Linear Warmup With Cosine Annealing · Adam
