WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment
Jiefu Ou, Arda Uzunoglu, Benjamin Van Durme, Daniel Khashabi

TL;DR
This paper explores the diversity of APIs needed for embodied AI agents to perform a wide range of tasks, proposing a framework to induce and analyze these APIs based on wikiHow instructions, highlighting the gap in current simulators.
Contribution
It introduces a novel framework for inducing and analyzing APIs for embodied agents using large language models and wikiHow data, emphasizing the need for richer simulated environments.
Findings
Induced over 300 APIs from a small subset of wikiHow tutorials.
Reused existing APIs and fabricated new ones to cover diverse tasks.
Current simulators support only a small fraction of these APIs.
Abstract
AI systems make decisions in physical environments through primitive actions or affordances that are accessed via API calls. While deploying AI agents in the real world involves numerous high-level actions, existing embodied simulators offer a limited set of domain-salient APIs. This naturally brings up the questions: how many primitive actions (APIs) are needed for a versatile embodied agent, and what should they look like? We explore this via a thought experiment: assuming that wikiHow tutorials cover a wide variety of human-written tasks, what is the space of APIs needed to cover these instructions? We propose a framework to iteratively induce new APIs by grounding wikiHow instruction to situated agent policies. Inspired by recent successes in large language models (LLMs) for embodied planning, we propose a few-shot prompting to steer GPT-4 to generate Pythonic programs as agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Mobile Agent-Based Network Management
MethodsAttention Is All You Need · Sparse Evolutionary Training · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections
