Why Inference in Large Models Becomes Decomposable After Training
Jidong Jin

TL;DR
This paper reveals that large AI models' inference systems become decomposable after training due to localized gradient updates, enabling efficient parallel inference without altering model functionality.
Contribution
It introduces a post-training criterion and structural annealing method to identify and leverage stable substructures for decomposable inference.
Findings
Inference systems are structurally non-uniform post-training.
Gradient updates are highly localized and leave many dependencies unchanged.
Decomposable inference enables parallel processing without model modification.
Abstract
Inference in large-scale AI models is typically performed on dense parameter matrices, leading to inference cost and system complexity that scale unsustainably with model size. This limitation does not arise from insufficient model capacity, but from treating post-training inference systems as monolithic operators while ignoring internal structures formed during learning. We show that gradient update events in large models are highly localized and selective, leaving many parameter dependencies statistically indistinguishable from their initialization distribution after training. As a result, post-training inference systems are structurally non-uniform and inherently decomposable. Based on this observation, we introduce a post-training statistical criterion and a structural annealing procedure that removes unsupported dependencies and reveals stable, independent substructures. This work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
