Transfer entropy and O-information to detect grokking in tensor network multi-class classification problems

Domenico Pomarico; Roberto Cilli; Alfonso Monaco; Loredana Bellantuono; Marianna La Rocca; Tommaso Maggipinto; Giuseppe Magnifico; Marlis Ontivero Ortega; Ester Pantaleo; Sabina Tangaro; Sebastiano Stramaglia; Roberto Bellotti; Nicola Amoroso

arXiv:2507.23346·quant-ph·September 30, 2025

Transfer entropy and O-information to detect grokking in tensor network multi-class classification problems

Domenico Pomarico, Roberto Cilli, Alfonso Monaco, Loredana Bellantuono, Marianna La Rocca, Tommaso Maggipinto, Giuseppe Magnifico, Marlis Ontivero Ortega, Ester Pantaleo, Sabina Tangaro, Sebastiano Stramaglia, Roberto Bellotti, Nicola Amoroso

PDF

TL;DR

This paper investigates the phenomenon of grokking in tensor network classifiers for multi-class problems, using information theory tools like transfer entropy and O-information to understand the dynamics of generalization and overfitting.

Contribution

It introduces a novel application of transfer entropy and O-information to analyze grokking in quantum-inspired tensor network models for multi-class classification.

Findings

01

Grokking coincides with entanglement transition and peak in redundant information in fashion MNIST.

02

Overfitted hyper-spectral model shows persistent synergistic, disordered behavior.

03

High-order information dynamics are crucial for understanding generalization in quantum-inspired learning.

Abstract

Quantum-enhanced machine learning, encompassing both quantum algorithms and quantum-inspired classical methods such as tensor networks, offers promising tools for extracting structure from complex, high-dimensional data. In this work, we study the training dynamics of Matrix Product State (MPS) classifiers applied to three-class problems, using both fashion MNIST and hyper-spectral satellite imagery as representative datasets. We investigate the phenomenon of grokking, where generalization emerges suddenly after memorization, by tracking entanglement entropy, local magnetization, and model performance across training sweeps. Additionally, we employ information theory tools to gain deeper insights: transfer entropy is used to reveal causal dependencies between label-specific quantum masks, while O-information captures the shift from synergistic to redundant correlations among class…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.