The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently
Elisabetta Cornacchia, Dan Mikulincer, Elchanan Mossel

TL;DR
This paper demonstrates that temporal correlations in data generated by a random walk enable efficient learning of Boolean k-juntas with gradient-based methods, contrasting with standard independent sampling.
Contribution
It reveals that temporal dependencies can be exploited by specific neural network training methods to learn sparse Boolean functions efficiently, unlike traditional approaches.
Findings
Temporal correlations enable linear-in-d sample complexity for learning k-juntas.
Two-layer ReLU networks with temporal-difference loss leverage data dependencies effectively.
Standard large-batch gradient methods do not benefit from temporal correlations.
Abstract
We study how temporal correlations in the data can make certain sparse learning problems efficiently learnable by gradient-based methods. Our focus is on Boolean k-juntas, a canonical sparse learning problem known to pose barriers for gradient-based methods under independent uniform samples. We show that this picture changes when the samples are generated by a lazy random walk on the hypercube. In this setting, the temporal dependencies can be exploited by a two-layer ReLU network trained using stylized-SGD with a temporal-difference loss, which compares target and predicted increments across consecutive samples. For every fixed k, the resulting sample complexity is essentially linear in the ambient dimension d. By contrast, we show that for large-batch gradient methods using standard convex pointwise losses, temporal correlations do not provide the same advantage.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
