Prototype Training with Dual Pseudo-Inverse and Optimized Hidden Activations

Mauro Tucci

arXiv:2508.09787·cs.LG·August 14, 2025

Prototype Training with Dual Pseudo-Inverse and Optimized Hidden Activations

Mauro Tucci

PDF

TL;DR

Proto-PINV+H introduces a rapid training method combining closed-form weight computation with gradient optimization of synthetic data and activations, achieving high accuracy efficiently on MNIST datasets.

Contribution

It proposes a novel training paradigm that shifts learning from weights to data and activations, enabling fast training with theoretical insights into generalization.

Findings

01

Achieves 97.8% accuracy on MNIST in under 4.5 seconds.

02

Uses approximately 130k trainable parameters with 250 epochs.

03

Outperforms traditional shallow models in accuracy-speed-size trade-offs.

Abstract

We present Proto-PINV+H, a fast training paradigm that combines closed-form weight computation with gradient-based optimisation of a small set of synthetic inputs, soft labels, and-crucially-hidden activations. At each iteration we recompute all weight matrices in closed form via two (or more) ridge-regularised pseudo-inverse solves, while updating only the prototypes with Adam. The trainable degrees of freedom are thus shifted from weight space to data/activation space. On MNIST (60k train, 10k test) and Fashion-MNIST (60k train, 10k test), our method reaches 97.8% and 89.3% test accuracy on the official 10k test sets, respectively, in 3.9s--4.5s using approximately 130k trainable parameters and only 250 epochs on an RTX 5060 (16GB). We provide a multi-layer extension (optimised activations at each hidden stage), learnable ridge parameters, optional PCA/PLS projections, and theory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.