Prototype Training with Dual Pseudo-Inverse and Optimized Hidden Activations
Mauro Tucci

TL;DR
Proto-PINV+H introduces a rapid training method combining closed-form weight computation with gradient optimization of synthetic data and activations, achieving high accuracy efficiently on MNIST datasets.
Contribution
It proposes a novel training paradigm that shifts learning from weights to data and activations, enabling fast training with theoretical insights into generalization.
Findings
Achieves 97.8% accuracy on MNIST in under 4.5 seconds.
Uses approximately 130k trainable parameters with 250 epochs.
Outperforms traditional shallow models in accuracy-speed-size trade-offs.
Abstract
We present Proto-PINV+H, a fast training paradigm that combines closed-form weight computation with gradient-based optimisation of a small set of synthetic inputs, soft labels, and-crucially-hidden activations. At each iteration we recompute all weight matrices in closed form via two (or more) ridge-regularised pseudo-inverse solves, while updating only the prototypes with Adam. The trainable degrees of freedom are thus shifted from weight space to data/activation space. On MNIST (60k train, 10k test) and Fashion-MNIST (60k train, 10k test), our method reaches 97.8% and 89.3% test accuracy on the official 10k test sets, respectively, in 3.9s--4.5s using approximately 130k trainable parameters and only 250 epochs on an RTX 5060 (16GB). We provide a multi-layer extension (optimised activations at each hidden stage), learnable ridge parameters, optional PCA/PLS projections, and theory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
