# Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation

**Authors:** Chuye Zhang, Xiaoxiong Zhang, Wei Pan, Linfang Zheng, Wei Zhang

arXiv: 2509.00361 · 2025-09-03

## TL;DR

This paper presents GVF-TAPE, a novel closed-loop framework combining generative visual foresight and task-agnostic pose estimation to enable scalable, real-time robotic manipulation across diverse tasks in unstructured environments.

## Contribution

Introduction of GVF-TAPE, a scalable framework that integrates generative visual foresight with pose estimation for adaptive robotic manipulation.

## Key findings

- Reduces reliance on task-specific data
- Achieves real-time, adaptive manipulation
- Generalizes effectively across tasks

## Abstract

Robotic manipulation in unstructured environments requires systems that can generalize across diverse tasks while maintaining robust and reliable performance. We introduce {GVF-TAPE}, a closed-loop framework that combines generative visual foresight with task-agnostic pose estimation to enable scalable robotic manipulation. GVF-TAPE employs a generative video model to predict future RGB-D frames from a single side-view RGB image and a task description, offering visual plans that guide robot actions. A decoupled pose estimation model then extracts end-effector poses from the predicted frames, translating them into executable commands via low-level controllers. By iteratively integrating video foresight and pose estimation in a closed loop, GVF-TAPE achieves real-time, adaptive manipulation across a broad range of tasks. Extensive experiments in both simulation and real-world settings demonstrate that our approach reduces reliance on task-specific action data and generalizes effectively, providing a practical and scalable solution for intelligent robotic systems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00361/full.md

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00361/full.md

## References

67 references — full list in the complete paper: https://tomesphere.com/paper/2509.00361/full.md

---
Source: https://tomesphere.com/paper/2509.00361