TL;DR
KnotGym is a minimalistic, interactive environment designed to evaluate complex spatial reasoning and manipulation skills through goal-oriented rope tasks based on image observations, emphasizing perception, reasoning, and manipulation challenges.
Contribution
We introduce KnotGym, a scalable environment for spatial reasoning that uses image-based rope manipulation tasks with quantifiable complexity levels for benchmarking AI methods.
Findings
Model-based RL and MPC methods face significant challenges in KnotGym.
KnotGym effectively tests perception, reasoning, and manipulation integration.
The environment provides a scalable platform for future research in spatial reasoning.
Abstract
We propose KnotGym, an interactive environment for complex, spatial reasoning and manipulation. KnotGym includes goal-oriented rope manipulation tasks with varying levels of complexity, all requiring acting from pure image observations. Tasks are defined along a clear and quantifiable axis of complexity based on the number of knot crossings, creating a natural generalization test. KnotGym has a simple observation space, allowing for scalable development, yet it highlights core challenges in integrating acute perception, spatial reasoning, and grounded manipulation. We evaluate methods of different classes, including model-based RL, model-predictive control, and chain-of-thought reasoning, and illustrate the challenges KnotGym presents. KnotGym is available at https://github.com/lil-lab/knotgym.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
