SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control
Stavros Orfanoudakis, Pedro P. Vergara

TL;DR
SAVGO introduces a geometry-aware reinforcement learning method that uses cosine similarity in a learned state-action space to improve policy updates and performance on continuous control tasks.
Contribution
It proposes a novel approach that explicitly incorporates value-based similarity into policy updates via a learned geometry, unifying representation, value estimation, and policy optimization.
Findings
SAVGO outperforms strong baselines on MuJoCo benchmarks.
The learned geometry improves policy guidance beyond local gradients.
Ablation studies confirm the effectiveness of value-geometry learning.
Abstract
While representation and similarity learning have improved the sample efficiency of Reinforcement Learning (RL), they are rarely used to shape policy updates directly in the action space. To bridge this gap, a geometry-aware RL algorithm that explicitly incorporates value-based similarity into the policy update, State-Action Value Geometry Optimization (SAVGO), is proposed. In detail, SAVGO learns a joint state-action embedding space in which pairs with similar action-value estimates exhibit high cosine similarity, while dissimilar pairs are mapped to distinct directions. This learned geometry enables the generation of a similarity kernel over candidate actions sampled at each update, allowing policy improvement to be guided directly toward higher-value regions beyond local gradient-based updates. As a result, representation learning, value estimation, and policy optimization are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
