Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Zhijian Zhuo, Ya Wang, Yutao Zeng, Xiaoqing Li, Xun Zhou, and Jinwen Ma

TL;DR
This paper introduces Polynomial Composition Activations (PolyCom), a novel activation function for transformers that enhances expressivity and efficiency, leading to improved performance in large language models through theoretical analysis and empirical validation.
Contribution
The paper proposes PolyCom, a new activation function with proven optimal approximation capabilities, and demonstrates its effectiveness in large language model pre-training.
Findings
PolyCom achieves the optimal approximation rate.
LLMs with PolyCom outperform those with traditional activations.
PolyCom improves accuracy and convergence in large language models.
Abstract
Transformers have found extensive applications across various domains due to the powerful fitting capabilities. This success can be partially attributed to their inherent nonlinearity. Thus, in addition to the ReLU function employed in the original transformer architecture, researchers have explored alternative modules such as GeLU and SwishGLU to enhance nonlinearity and thereby augment representational capacity. In this paper, we propose a novel category of polynomial composition activations (PolyCom), designed to optimize the dynamics of transformers. Theoretically, we provide a comprehensive mathematical analysis of PolyCom, highlighting its enhanced expressivity and efficacy relative to other activation functions. Notably, we demonstrate that networks incorporating PolyCom achieve the , indicating that PolyCom networks require minimal parameters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Refunds@Expedia|||How do I get a full refund from Expedia? · Polynomial Composition Activations
