Towards Logic-Aware Manipulation: A Knowledge Primitive for VLM-Based Assistants in Smart Manufacturing
Suchang Chen, Daqiang Guo

TL;DR
This paper introduces a formal manipulation-logic schema {} for vision-language models in manufacturing, enabling better handling of contact-rich actions through object-centric knowledge representation and reasoning.
Contribution
It formalizes an object-centric manipulation-logic schema {} as a knowledge primitive for VLM-based assistants, supporting planning, data augmentation, and retrieval in manufacturing tasks.
Findings
Schema improves VLM planning quality in manufacturing tasks.
Knowledge base enhances data augmentation and retrieval.
Demonstrated on a 3D-printer spool-removal task.
Abstract
Existing pipelines for vision-language models (VLMs) in robotic manipulation prioritize broad semantic generalization from images and language, but typically omit execution-critical parameters required for contact-rich actions in manufacturing cells. We formalize an object-centric manipulation-logic schema, serialized as an eight-field tuple {\tau}, which exposes object, interface, trajectory, tolerance, and force/impedance information as a first-class knowledge signal between human operators, VLM-based assistants, and robot controllers. We instantiate {\tau} and a small knowledge base (KB) on a 3D-printer spool-removal task in a collaborative cell, and analyze {\tau}-conditioned VLM planning using plan-quality metrics adapted from recent VLM/LLM planning benchmarks, while demonstrating how the same schema supports taxonomy-tagged data augmentation at training time and logic-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
