Latent Planning via Embedding Arithmetic: A Contrastive Approach to Strategic Reasoning
Andrew Hamara, Greg Hamerly, Pablo Rivas, Andrew C. Freeman

TL;DR
This paper introduces SOLIS, a contrastively learned embedding space for strategic planning in high-dimensional decision tasks like chess, enabling efficient, lightweight planning without explicit policy or dynamics models.
Contribution
The paper presents SOLIS, a novel contrastive learning approach that creates an evaluation-aligned latent space for planning, reducing reliance on traditional models.
Findings
SOLIS achieves competitive chess performance with shallow search.
Evaluation-aligned embeddings facilitate effective planning in high-dimensional spaces.
The approach offers a lightweight alternative to dynamics models or policy learning.
Abstract
Planning in high-dimensional decision spaces is increasingly being studied through the lens of learned representations. Rather than training policies or value heads, we investigate whether planning can be carried out directly in an evaluation-aligned embedding space. We introduce SOLIS, which learns such a space using supervised contrastive learning. In this representation, outcome similarity is captured by proximity, and a single global advantage vector orients the space from losing to winning regions. Candidate actions are then ranked according to their alignment with this direction, reducing planning to vector operations in latent space. We demonstrate this approach in chess, where SOLIS uses only a shallow search guided by the learned embedding to reach competitive strength under constrained conditions. More broadly, our results suggest that evaluation-aligned latent planning offers…
Peer Reviews
Decision·Submitted to ICLR 2026
- Original conceptual framing: The paper introduces an appealing idea of planning as geometric movement within a learned evaluation-aligned space, offering a fresh perspective that reframes search as navigation in representation space rather than explicit value prediction. - Interpretability and insight: UMAP visualizations and latent trajectories show that game progression corresponds to smooth motion along the advantage direction; decisive games exhibit coherent latent flow. This offers rare
- Questionable novelty and conceptual framing: The “latent planning via advantage direction” boils down to using a linear classifier in embedding space that correlates with Stockfish’s scalar evaluation. Functionally, this is equivalent to training a value network and ranking continuations by predicted value — just with an extra cosine projection. The contrastive loss does not produce new algorithmic behavior, only a reparametrization of regression in latent space. - The claim of planning in l
* The paper clearly articulates its main ideas and presents them in an accessible way. It effectively demonstrates how to construct an evaluation-aligned latent space using contrastive learning, extract a single advantage vector, and use this direction to guide a compact search procedure. * The two scoring variants, anchored and unanchored, are intuitive and well explained. * The latent space visualizations are insightful and visually engaging, helping to illustrate the internal structure of t
The paper has two main issues that needs clarification and refinement: * The construction of the single global “advantage vector” is underspecified and potentially fragile * Evaluation framing and the “2500+ Elo” claim are somewhat misleading *Issue 1*: The central idea of the paper is that a single vector in the embedding space captures the direction from “losing” to “winning” positions. However, the process of creating a single advantage vector by averaging the embedding across extreme pos
This paper is extremely well written. It is clear from the paper what it's contributions are, what the method is, and what problem the paper is attempting to solve. The method is simple, and likely can transfer to other domains where search can be used.
1. I have concerns about the lack of included baselines. One of the papers that I am fairly familiar with in this area was not mentioned at all [1], while the mention of AlphaZero also gave me the expectation that it would be compared in the work as well as it is also one of the main algorithms in this space. A comparison to the various Dreamer versions could also be very useful here. 2. I think Table 2 is missing information about the configuration on how these results were generated. How man
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · AI-based Problem Solving and Planning
