PolyGLU: State-Conditional Activation Routing in Transformer Feed-Forward Networks
Daniel Nobrega Medeiros

TL;DR
PolyGLU introduces a dynamic, input-conditioned activation routing mechanism in transformer feed-forward networks, leading to emergent layer-specific specialization with minimal parameter overhead, trained efficiently on large-scale data.
Contribution
It proposes PolyGLU, a novel differentiable activation routing method for transformers, demonstrating emergent deterministic routing and layer-wise specialization without explicit regularization.
Findings
Routing converges to near-deterministic choices
Early layers prefer GELU, deep layers favor Tanh
Achieves substantial performance with fewer training tokens
Abstract
Biological neural systems employ diverse neurotransmitters -- glutamate, GABA, dopamine, acetylcholine -- to implement distinct signal-processing modalities within shared neural circuits. In contrast, modern transformers apply a single fixed activation function across all feed-forward neurons. We introduce PolyGLU (Polychromatic Gated Linear Unit), a drop-in replacement for SwiGLU that enables each FFN neuron to dynamically route among K=4 activation functions via a differentiable mechanism combining learned static preferences with input-conditioned gating, trained end-to-end with Gumbel-Softmax. We train PolychromaticLM, a 597M-parameter transformer, on ~10B tokens using a single NVIDIA A100 GPU. Our key finding is emergent routing behavior: without any explicit sparsity loss or entropy regularization, the routing mechanism converges to near-deterministic activation selections (mean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Software-Defined Networks and 5G
