giMLPs: Gate with Inhibition Mechanism in MLPs
Cheng Kang, Jindich Prokop, Lei Tong, Huiyu Zhou, Yong Hu, Daneil, Novak

TL;DR
This paper introduces giMLPs, a novel architecture with gating and inhibition mechanisms that enhance model performance on image and language tasks without additional pretraining.
Contribution
The paper proposes the gate with inhibition MLP (giMLP) architecture, improving model adaptability and feature restriction, applicable to both vision and language models.
Findings
giMLP achieves comparable ImageNet accuracy to CycleMLP.
The techniques significantly improve NLU downstream task performance.
Gate with inhibition enhances model capacity without extra pretraining.
Abstract
This paper presents a new model architecture, gate with inhibition MLP (giMLP).The gate with inhibition on CycleMLP (gi-CycleMLP) can produce equal performance on the ImageNet classification task, and it also improves the BERT, Roberta, and DeBERTaV3 models depending on two novel techniques. The first is the gating MLP, where matrix multiplications between the MLP and the trunk Attention input in further adjust models' adaptation. The second is inhibition which inhibits or enhances the branch adjustment, and with the inhibition levels increasing, it offers models more muscular features restriction. We show that the giCycleMLP with a lower inhibition level can be competitive with the original CycleMLP in terms of ImageNet classification accuracy. In addition, we also show through a comprehensive empirical study that these techniques significantly improve the performance of fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Neural Network Applications · Machine Learning and Algorithms
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · How do I file a dispute with Expedia?*DisputeFastService · Layer Normalization · Adam · WordPiece · Weight Decay · Linear Warmup With Linear Decay · Residual Connection
