Bridging Bots: from Perception to Action via Multimodal-LMs and Knowledge Graphs

Margherita Martorana; Francesca Urgese; Mark Adamik; Ilaria Tiddi

arXiv:2507.09617·cs.AI·July 15, 2025

Bridging Bots: from Perception to Action via Multimodal-LMs and Knowledge Graphs

Margherita Martorana, Francesca Urgese, Mark Adamik, Ilaria Tiddi

PDF

Open Access

TL;DR

This paper introduces a neurosymbolic framework combining multimodal language models and knowledge graphs to enhance robot perception and interoperability in domestic service robots, addressing limitations of existing proprietary systems.

Contribution

It proposes a novel integration of multimodal LLMs with structured knowledge graphs to generate ontology-compliant representations for robotic applications.

Findings

01

GPT-o1 and LLaMA 4 Maverick outperform other models in KG generation

02

Newer models do not always yield better results

03

Integration strategy critically affects KG quality

Abstract

Personal service robots are deployed to support daily living in domestic environments, particularly for elderly and individuals requiring assistance. These robots must perceive complex and dynamic surroundings, understand tasks, and execute context-appropriate actions. However, current systems rely on proprietary, hard-coded solutions tied to specific hardware and software, resulting in siloed implementations that are difficult to adapt and scale across platforms. Ontologies and Knowledge Graphs (KGs) offer a solution to enable interoperability across systems, through structured and standardized representations of knowledge and reasoning. However, symbolic systems such as KGs and ontologies struggle with raw and noisy sensory input. In contrast, multimodal language models are well suited for interpreting input such as images and natural language, but often lack transparency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques