Dynamics of feed forward induced interference training
Shirui Tang

TL;DR
This paper proposes a physics-inspired training method for transformer models like GPT, viewing self-attention as an energy system and optimizing signal interference to improve learning, especially in few-shot scenarios.
Contribution
It introduces a novel physics-based training approach that models self-attention as an Ising-like energy system, differing from traditional backpropagation methods.
Findings
Demonstrates learning behavior similar to humans in few-shot scenarios
Shows different dynamics from backpropagation-based networks
Achieves promising results on a 4-class MNIST classification task
Abstract
Preceptron model updating with back propagation has become the routine of deep learning. Continuous feed forward procedure is required in order for backward propagate to function properly. Doubting the underlying physical interpretation on transformer based models such as GPT brought about by the routine explaination, a new method of training is proposed in order to keep self-consistency of the physics. By treating the GPT model as a space-time diagram, and then trace the worldlines of signals, identifing the possible paths of signals in order fot a self-attention event to occure. With a slight modification, self-attention can be viewed as an ising model interaction, which enables the goal to be designed as energy of system. Target is treated as an external magnetic field inducing signals modeled as magnetic dipoles. A probability network is designed to pilot input signals travelling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Computational Physics and Python Applications · Gaussian Processes and Bayesian Inference
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Weight Decay · Linear Warmup With Cosine Annealing · Dropout · Adam · Layer Normalization
