# Poke and Strike: Learning Task-Informed Exploration Policies

**Authors:** Marina Y. Aoyama, Joao Moura, Juan Del Aguila Ferrandis, Sethu Vijayakumar

arXiv: 2509.00178 · 2025-09-03

## TL;DR

This paper introduces a reinforcement learning approach for robotic exploration that efficiently identifies physical object properties for successful task execution, significantly reducing exploration time and increasing success rates.

## Contribution

It presents a novel task-informed exploration policy that uses sensitivity-based rewards and an uncertainty mechanism to optimize property estimation in robotic tasks.

## Key findings

- Achieves 90% success rate in striking task with minimal exploration time
- Outperforms baselines with at most 40% success or inefficient retraining
- Successfully validates property identification and task adjustment on physical robot

## Abstract

In many dynamic robotic tasks, such as striking pucks into a goal outside the reachable workspace, the robot must first identify the relevant physical properties of the object for successful task execution, as it is unable to recover from failure or retry without human intervention. To address this challenge, we propose a task-informed exploration approach, based on reinforcement learning, that trains an exploration policy using rewards automatically generated from the sensitivity of a privileged task policy to errors in estimated properties. We also introduce an uncertainty-based mechanism to determine when to transition from exploration to task execution, ensuring sufficient property estimation accuracy with minimal exploration time. Our method achieves a 90% success rate on the striking task with an average exploration time under 1.2 seconds, significantly outperforming baselines that achieve at most 40% success or require inefficient querying and retraining in a simulator at test time. Additionally, we demonstrate that our task-informed rewards capture the relative importance of physical properties in both the striking task and the classical CartPole example. Finally, we validate our approach by demonstrating its ability to identify object properties and adjust task execution in a physical setup using the KUKA iiwa robot arm.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00178/full.md

## Figures

48 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00178/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/2509.00178/full.md

---
Source: https://tomesphere.com/paper/2509.00178