Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design

Markus D. Solbach; John K. Tsotsos

arXiv:2511.17595·cs.LG·November 25, 2025

Boosting Reinforcement Learning in 3D Visuospatial Tasks Through Human-Informed Curriculum Design

Markus D. Solbach, John K. Tsotsos

PDF

Open Access 3 Reviews

TL;DR

This paper explores enhancing reinforcement learning in complex 3D visuospatial tasks by integrating human-informed curriculum design, demonstrating improved learning outcomes over traditional methods.

Contribution

It introduces a curriculum learning approach guided by human experiment insights to improve RL performance in complex 3D visuospatial tasks.

Findings

01

Curriculum learning significantly improved RL success rates.

02

Traditional RL methods struggled with the task without curriculum guidance.

03

Human insights effectively informed curriculum design.

Abstract

Reinforcement Learning is a mature technology, often suggested as a potential route towards Artificial General Intelligence, with the ambitious goal of replicating the wide range of abilities found in natural and artificial intelligence, including the complexities of human cognition. While RL had shown successes in relatively constrained environments, such as the classic Atari games and specific continuous control problems, recent years have seen efforts to expand its applicability. This work investigates the potential of RL in demonstrating intelligent behaviour and its progress in addressing more complex and less structured problem domains. We present an investigation into the capacity of modern RL frameworks in addressing a seemingly straightforward 3D Same-Different visuospatial task. While initial applications of state-of-the-art methods, including PPO, behavioural cloning and…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

1. The paper adapts a well-known visuospatial reasoning problem to the RL setting. The objective is well-defined, and the motivation for using this task to explore the limits of visual reasoning in RL agents is clear. The simple setup offers an intuitive and interpretable framework for examining how agents develop spatial understanding and active perception strategies. It’s refreshing to see a problem that feels genuine rather than over-engineered. 2. The curriculum learning results are insightf

Weaknesses

1. The paper does not clearly formalize the task as a POMDP. While the text describes the environment qualitatively, it never defines the state, observation, action, and reward functions that make this original binary classification task into a sequential decision-making problem. Without this formal grounding, it remains unclear what the agent is optimizing, how partial observability is handled, or how the final binary decision integrates into the trajectory-based reward structure. 2. The sparse

Reviewer 02Rating 4Confidence 3

Strengths

The experiments are interesting. To my knowledge the experimental platform is novel.

Weaknesses

- The results are somewhat unedifying. We see that curriculum learning does seem to help to tackle more open environment. Furthermore, it seems that for this specific task, a certain tweak in the order of the curriculum, inspired by human experiments, may have slightly improved performance on 48 cells environments. It is not obvious what we can extrapolate from this for RL in general. - More generally, as the authors acknowledge, comparisons with human behavior are extremely difficult due to th

Reviewer 03Rating 2Confidence 3

Strengths

- The same-different task itself is quite interesting and to my knowledge novel to solve in the realm of RL. - The analysis from the discussion section is quite interesting. Especially, the plot in figure 10 and 11 where the RL agent exploits different viewpoints from humans. - Variations to increase the dimensionality of the same-different task is experimented. - The proposed method is simple, which can be seen as a strength but also as a weakness.

Weaknesses

- First of all, the writing of the manuscript can be improved, especially in the introduction section. For example, the relationship between this paper and reward design, as mentioned in the introduction, is unclear to me. Furthermore, some important details about the experiments and methods are not clearly described. For example, what is the loss of BC? One can assume it might be MSE, but it is not clearly written in the manuscript. Details about the curriculum learning is also not described, s

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI