MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of   Large Language Models

Zhen Zhang; Yifan Yang; Kai Zhen; Nathan Susanj; Athanasios; Mouchtaris; Siegfried Kunzmann; Zheng Zhang

arXiv:2502.11513·cs.LG·February 18, 2025

MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models

Zhen Zhang, Yifan Yang, Kai Zhen, Nathan Susanj, Athanasios, Mouchtaris, Siegfried Kunzmann, Zheng Zhang

PDF

Open Access 1 Video

TL;DR

MaZO introduces a novel zeroth-order optimization framework for multi-task fine-tuning of large language models, effectively reducing memory usage and gradient variance, and outperforming existing methods.

Contribution

MaZO is the first framework tailored for multi-task LLM fine-tuning under zeroth-order optimization, addressing gradient variance and task conflict challenges.

Findings

01

MaZO achieves state-of-the-art multi-task fine-tuning performance.

02

MaZO surpasses first-order multi-task learning methods.

03

MaZO reduces memory usage and gradient variance in ZO optimization.

Abstract

Large language models have demonstrated exceptional capabilities across diverse tasks, but their fine-tuning demands significant memory, posing challenges for resource-constrained environments. Zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating the need for backpropagation. However, ZO optimization suffers from high gradient variance, and prior research has largely focused on single-task learning, leaving its application to multi-task learning unexplored. Multi-task learning is crucial for leveraging shared knowledge across tasks to improve generalization, yet it introduces unique challenges under ZO settings, such as amplified gradient variance and collinearity. In this paper, we present MaZO, the first framework specifically designed for multi-task LLM fine-tuning under ZO optimization. MaZO tackles these challenges at the parameter level through two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models· underline

Taxonomy

TopicsTopic Modeling