VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Ying Nie; Kai Han; Hongguang Li; Hang Zhou; Tianyu Guo; Enhua Wu; Xinghao Chen; Yunhe Wang

arXiv:2512.14531·cs.CL·February 3, 2026

VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Ying Nie, Kai Han, Hongguang Li, Hang Zhou, Tianyu Guo, Enhua Wu, Xinghao Chen, Yunhe Wang

PDF

Open Access

TL;DR

VersatileFFN introduces a flexible, parameter-efficient FFN architecture for LLMs that adaptively reuses parameters across width and depth to improve performance without increasing memory costs.

Contribution

It proposes a novel adaptive FFN with dual pathways for width and depth reuse, enabling better capacity utilization within fixed parameter budgets.

Findings

01

Improves performance across multiple benchmarks.

02

Effectively balances token complexity with adaptive routing.

03

Reduces memory costs while enhancing model capacity.

Abstract

The rapid scaling of Large Language Models (LLMs) has achieved remarkable performance, but it also leads to prohibitive memory costs. Existing parameter-efficient approaches such as pruning and quantization mainly compress pretrained models without enhancing architectural capacity, thereby hitting the representational ceiling of the base model. In this work, we propose VersatileFFN, a novel feed-forward network (FFN) that enables flexible reuse of parameters in both width and depth dimensions within a fixed parameter budget. Inspired by the dual-process theory of cognition, VersatileFFN comprises two adaptive pathways: a width-versatile path that generates a mixture of sub-experts from a single shared FFN, mimicking sparse expert routing without increasing parameters, and a depth-versatile path that recursively applies the same FFN to emulate deeper processing for complex tokens. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis