Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning

Nusrat Jahan Prottasha; Md Kowsher; Chun-Nam Yu; Chen Chen; Ozlem Garibay

arXiv:2601.06356·cs.LG·January 13, 2026

Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning

Nusrat Jahan Prottasha, Md Kowsher, Chun-Nam Yu, Chen Chen, Ozlem Garibay

PDF

Open Access

TL;DR

Monkey Jump introduces a novel, parameter-efficient multi-task learning method that leverages existing adapters as implicit experts, using clustering-based routing to improve expressivity without extra trainable parameters.

Contribution

It proposes a new routing technique for adapters in Transformers that enhances multi-task learning efficiency without additional trainable parameters.

Findings

01

Achieves competitive performance with fewer trainable parameters.

02

Reduces memory consumption by up to 48%.

03

Speeds up training by 1.5 to 2 times.

Abstract

Mixture-of-experts variants of parameter-efficient fine-tuning enable per-token specialization, but they introduce additional trainable routers and expert parameters, increasing memory usage and training cost. This undermines the core goal of parameter-efficient fine-tuning. We propose Monkey Jump, a method that brings mixture-of-experts-style specialization to parameter-efficient fine-tuning without introducing extra trainable parameters for experts or routers. Instead of adding new adapters as experts, Monkey Jump treats the adapters already present in each Transformer block (such as query, key, value, up, and down projections) as implicit experts and routes tokens among them. Routing is performed using k-means clustering with exponentially moving averaged cluster centers, requiring no gradients and no learned parameters. We theoretically show that token-wise routing increases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Image and Video Quality Assessment · Advanced Data Compression Techniques