ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow
Shanshan Guo, Xiwen Liang, Junfan Lin, Yuzheng Zhuang, Liang Lin, Xiaodan Liang

TL;DR
ActionSink introduces a novel framework for precise robot manipulation by reformulating actions as optical flows and integrating them dynamically, significantly improving accuracy on benchmark tasks.
Contribution
It proposes a self-supervised action flow reformulation and a dynamic integration method, advancing low-level action estimation in learning-based robot manipulation.
Findings
Outperformed SOTA on LIBERO benchmark by 7.9% success rate.
Achieved nearly 8% accuracy gain on LIBERO-Long.
Introduced a coarse-to-fine action flow matcher and dynamic memory integration.
Abstract
Language-instructed robot manipulation has garnered significant interest due to the potential of learning from collected data. While the challenges in high-level perception and planning are continually addressed along the progress of general large pre-trained models, the low precision of low-level action estimation has emerged as the key limiting factor in manipulation performance. To this end, this paper introduces a novel robot manipulation framework, i.e., ActionSink, to pave the way toward precise action estimations in the field of learning-based robot manipulation. As the name suggests, ActionSink reformulates the actions of robots as action-caused optical flows from videos, called "action flow", in a self-supervised manner, which are then used to be retrieved and integrated to enhance the action estimation. Specifically, ActionSink incorporates two primary modules. The first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
