High-Dimensional Search, Low-Dimensional Solution: Decoupling Optimization from Representation
Yusuf Kalyoncuoglu, Ratmir Miftachov

TL;DR
This paper shows that large models' redundancy is mainly for optimization, not representation, and proposes a method to compress models significantly without losing performance, enabling more efficient deployment.
Contribution
It introduces a novel approach to decouple optimization from representation using random projections, revealing the intrinsic robustness of solution manifolds in large models.
Findings
Models can be compressed by up to 16x with minimal performance loss.
Random projections achieve comparable results to PCA and learned baselines.
The solution manifold in large models is intrinsically robust.
Abstract
State-of-the-art models rely on massive widths despite exhibiting low Intrinsic Dimension (ID). We posit that this redundancy serves the non-convex optimization search rather than the final representation. We validate this hypothesis by decoupling the solution geometry via data-independent random projections, demonstrating that ResNet, ViT, and BERT representations can be compressed by up to 16x with negligible performance degradation of around 1%. Notably, these oblivious projections achieve parity with PCA and learned baselines, confirming the solution manifold is intrinsically robust. These findings establish the foundation for Subspace-Native Distillation: a paradigm where student models target this intrinsic manifold directly, bypassing the high-dimensional optimization bottleneck to realize the vision of "Train Big, Deploy Small"
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · 3D Shape Modeling and Analysis
