TL;DR
UniDomain pre-trains a comprehensive PDDL domain from real-world demonstrations, enabling zero-shot generalization for complex robot task planning with significant performance improvements.
Contribution
It introduces a novel framework that extracts and unifies atomic domains from videos to support compositional generalization in robotic planning.
Findings
Achieves up to 58% higher task success rate
Improves plan optimality by 160% over baselines
Supports zero-shot generalization to unseen tasks
Abstract
Robotic task planning in real-world environments requires reasoning over implicit constraints from language and vision. While LLMs and VLMs offer strong priors, they struggle with long-horizon structure and symbolic grounding. Existing methods that combine LLMs with symbolic planning often rely on handcrafted or narrow domains, limiting generalization. We propose UniDomain, a framework that pre-trains a PDDL domain from robot manipulation demonstrations and applies it for online robotic task planning. It extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges. Given a target class of tasks, it retrieves relevant atomics from the unified domain and systematically fuses them into high-quality meta-domains to support compositional generalization in planning. Experiments on diverse real-world tasks show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
