Bridging Bots: from Perception to Action via Multimodal-LMs and Knowledge Graphs
Margherita Martorana, Francesca Urgese, Mark Adamik, Ilaria Tiddi

TL;DR
This paper introduces a neurosymbolic framework combining multimodal language models and knowledge graphs to enhance robot perception and interoperability in domestic service robots, addressing limitations of existing proprietary systems.
Contribution
It proposes a novel integration of multimodal LLMs with structured knowledge graphs to generate ontology-compliant representations for robotic applications.
Findings
GPT-o1 and LLaMA 4 Maverick outperform other models in KG generation
Newer models do not always yield better results
Integration strategy critically affects KG quality
Abstract
Personal service robots are deployed to support daily living in domestic environments, particularly for elderly and individuals requiring assistance. These robots must perceive complex and dynamic surroundings, understand tasks, and execute context-appropriate actions. However, current systems rely on proprietary, hard-coded solutions tied to specific hardware and software, resulting in siloed implementations that are difficult to adapt and scale across platforms. Ontologies and Knowledge Graphs (KGs) offer a solution to enable interoperability across systems, through structured and standardized representations of knowledge and reasoning. However, symbolic systems such as KGs and ontologies struggle with raw and noisy sensory input. In contrast, multimodal language models are well suited for interpreting input such as images and natural language, but often lack transparency,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
