The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

Elisabetta Cornacchia; Dan Mikulincer; Elchanan Mossel

arXiv:2605.10237·cs.LG·May 12, 2026

The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

Elisabetta Cornacchia, Dan Mikulincer, Elchanan Mossel

PDF

TL;DR

This paper demonstrates that temporal correlations in data generated by a random walk enable efficient learning of Boolean k-juntas with gradient-based methods, contrasting with standard independent sampling.

Contribution

It reveals that temporal dependencies can be exploited by specific neural network training methods to learn sparse Boolean functions efficiently, unlike traditional approaches.

Findings

01

Temporal correlations enable linear-in-d sample complexity for learning k-juntas.

02

Two-layer ReLU networks with temporal-difference loss leverage data dependencies effectively.

03

Standard large-batch gradient methods do not benefit from temporal correlations.

Abstract

We study how temporal correlations in the data can make certain sparse learning problems efficiently learnable by gradient-based methods. Our focus is on Boolean k-juntas, a canonical sparse learning problem known to pose barriers for gradient-based methods under independent uniform samples. We show that this picture changes when the samples are generated by a lazy random walk on the hypercube. In this setting, the temporal dependencies can be exploited by a two-layer ReLU network trained using stylized-SGD with a temporal-difference loss, which compares target and predicted increments across consecutive samples. For every fixed k, the resulting sample complexity is essentially linear in the ambient dimension d. By contrast, we show that for large-batch gradient methods using standard convex pointwise losses, temporal correlations do not provide the same advantage.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.