Grounding Predicates through Actions
Toki Migimatsu, Jeannette Bohg

TL;DR
This paper introduces a weakly supervised method for automatically labeling symbolic states in videos using action pre- and post-conditions, enabling efficient training of predicate classifiers for robotic reasoning.
Contribution
It presents a novel automatic labeling approach that reduces supervision costs and applies it to train predicate classifiers for symbolic reasoning in robotics.
Findings
Predicate classifiers match fully supervised performance
Automatic labeling significantly reduces annotation effort
Enables real-world task planning with learned predicates
Abstract
Symbols representing abstract states such as "dish in dishwasher" or "cup on table" allow robots to reason over long horizons by hiding details unnecessary for high-level planning. Current methods for learning to identify symbolic states in visual data require large amounts of labeled training data, but manually annotating such datasets is prohibitively expensive due to the combinatorial number of predicates in images. We propose a novel method for automatically labeling symbolic states in large-scale video activity datasets by exploiting known pre- and post-conditions of actions. This automatic labeling scheme only requires weak supervision in the form of an action label that describes which action is demonstrated in each video. We use our framework to train predicate classifiers to identify symbolic relationships between objects when prompted with object bounding boxes, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Robot Manipulation and Learning
