Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models
Jared Junkin, Samuel Nathanson

TL;DR
This paper demonstrates that causal masking can be effectively applied to spatial data like chess boards, leading to stronger models than sequential data training, with broader implications for spatial data modeling.
Contribution
The study provides the first systematic analysis of causal masking on spatial data, showing its viability and potential advantages over sequential representations in language models.
Findings
Models trained on spatial data with causal masking outperform sequential data models.
Causal masking on spatial data is a viable and sometimes preferable training method.
Results suggest broader applicability of causal masking in spatial domains.
Abstract
Language models are traditionally designed around causal masking. In domains with spatial or relational structure, causal masking is often viewed as inappropriate, and sequential linearizations are instead used. Yet the question of whether it is viable to accept the information loss introduced by causal masking on nonsequential data has received little direct study, in part because few domains offer both spatial and sequential representations of the same dataset. In this work, we investigate this issue in the domain of chess, which naturally supports both representations. We train language models with bidirectional and causal self-attention mechanisms on both spatial (board-based) and sequential (move-based) data. Our results show that models trained on spatial board states - \textit{even with causal masking} - consistently achieve stronger playing strength than models trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Topic Modeling
