Geometry of Uncertainty: Learning Metric Spaces for Multimodal State Estimation in RL
Alfredo Reichlin, Adriano Pacciarelli, Danica Kragic, Miguel Vasco

TL;DR
This paper introduces a novel metric space approach for multimodal state estimation in reinforcement learning, enabling robust, noise-agnostic, and geometrically interpretable state representations that improve agent performance.
Contribution
It proposes a structured latent space where distances reflect transition costs, along with a multimodal transition model and sensor fusion method that do not require explicit noise assumptions.
Findings
Improved robustness to sensor noise in RL tasks
Enhanced state estimation accuracy over baseline methods
Better RL agent performance without explicit noise augmentation
Abstract
Estimating the state of an environment from high-dimensional, multimodal, and noisy observations is a fundamental challenge in reinforcement learning (RL). Traditional approaches rely on probabilistic models to account for the uncertainty, but often require explicit noise assumptions, in turn limiting generalization. In this work, we contribute a novel method to learn a structured latent representation, in which distances between states directly correlate with the minimum number of actions required to transition between them. The proposed metric space formulation provides a geometric interpretation of uncertainty without the need for explicit probabilistic modeling. To achieve this, we introduce a multimodal latent transition model and a sensor fusion mechanism based on inverse distance weighting, allowing for the adaptive integration of multiple sensor modalities without prior…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper introduces a novel geometrical view for handling state representation and uncertainty in POMDPs. By refraining uncertainty in geometric terms it bypasses the complexities and assumptions of traditional probabilistic models. This idea could inspire a new direction of research for handling partial observability and uncertainty in RL. - The paper is well written and the methodology is sound. Including implementation details that could make future reproduction of the algorithm and empir
- The paper fail in citing previous work from Steccanella et al. (2022), "State Representation Learning for Goal-Conditioned Reinforcement Learning". That work appears to be the first to propose the idea of minimum number of actions distance and motivates very similar objectives for learning an embedding space where distance between states in this embedding space approximates the minimum action distance, by means of leveraging local constraints and the useful upper-bound of the trajectory distan
The primary strength of this work is its core conceptual shift. Instead of relying on Bayesian filtering, which requires restrictive priors or generative models , METRICMM recasts uncertainty in geometric terms. The inverse distance weighting (IDW) fusion mechanism is a direct and elegant consequence of the geometric formulation. The dynamics prediction z^t acts as a reliable "anchor." Any sensor encoding that is geometrically distant from this anchor is naturally identified as noise and its c
1) Naive Temporal Distance Loss: The paper's stated goal is to learn a space where distances correlate with the minimum number of actions required to transition between them. However, the actual loss function used is a massive oversimplification. This loss function does not model the minimum number of actions it models all single-step transitions are equidistant. It forces the distance between any two consecutive states to be 1, regardless of the optimality action taken. This is a naive implemen
State representation learning is an important problem in reinforcement learning, and in this sense the paper makes a timely contribution. It also looks as if the proposed approach performs well in practice compared to other algorithms.
I believe that several concepts are not clearly explained, which makes it difficult to accurately evaluate the contribution. Mainly for this reason my opinion regarding acceptance at ICLR is on the negative side. Apart from the cited paper by Wang et al., there are other works that explicitly learn a distance estimate between pairs of states. Concretely, the first work also measures distance as the minimum number of actions required to transition from one state to another. State Representation
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
