FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
Advait Gupta, Rishie Raj, Dang Nguyen, Tianyi Zhou

TL;DR
FaSTA* is a neurosymbolic agent that combines high-level planning with low-level search and subroutine reuse to perform efficient multi-turn image editing tasks, reducing computational costs.
Contribution
It introduces a cost-efficient, adaptive planning framework that leverages LLMs and subroutine mining to improve multi-turn image editing efficiency.
Findings
FaSTA* achieves higher computational efficiency than recent methods.
It maintains competitive success rates in complex image editing tasks.
Reusable subroutines significantly reduce exploration costs.
Abstract
We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as ``Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A search is activated. The reusable symbolic subroutines considerably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
