Latent Chain-of-Thought Improves Structured-Data Transformers
Carson Dudley, Samet Oymak

TL;DR
This paper introduces latent chain-of-thought mechanisms in structured-data transformers, enhancing reasoning and prediction accuracy across diverse datasets by enabling multiple rounds of latent computation.
Contribution
It proposes a recurrent scheme with feedback tokens for structured-data transformers, demonstrating improved performance on time-series and tabular datasets, including foundation models.
Findings
Latent chain-of-thought improves performance on 7/9 time-series datasets (+12.63\%)
Latent chain-of-thought improves performance on 23/27 tabular datasets (+3.25\%)
Applying latent CoT to a small foundation model outperforms larger models on tabular tasks.
Abstract
Chain-of-thought and more broadly test-time compute are known to augment the expressive capabilities of language models and have led to major innovations in reasoning. Motivated by this success, this paper explores latent chain-of-thought as well as the impact of depth and looping for time-series and tabular data. We propose a recurrent scheme in which a structured-data transformer, after an initial forward pass, compresses its query-position hidden states into feedback tokens that are appended to the input and processed again, allowing multiple rounds of latent computation before prediction. We compare CoT models against a same-depth no-CoT baseline, a deeper baseline matched to the CoT model in effective depth, and a looped transformer with weight-tied recurrence but no additional chain-of-thought tokens. Across 36 datasets in time-series forecasting and tabular prediction, latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
