Benchmarking Local Language Models for Social Robots using Edge Devices

Dorian Lamouille; Matev\v{z} B. Zorec; Farnaz Baksh; Karl Kruusam\"ae

arXiv:2605.03111·cs.RO·May 6, 2026

Benchmarking Local Language Models for Social Robots using Edge Devices

Dorian Lamouille, Matev\v{z} B. Zorec, Farnaz Baksh, Karl Kruusam\"ae

PDF

TL;DR

This study systematically benchmarks 25 open-source language models on edge devices for social robots, evaluating efficiency, knowledge, and teaching effectiveness to inform deployment strategies.

Contribution

It provides a comprehensive comparison of models for pedagogical social robots, highlighting trade-offs and proposing a three-tier inference architecture for resource-limited hardware.

Findings

01

Granite4 Tiny Hybrid (7B) balances speed, energy, and accuracy.

02

MMLU accuracy ranges from near-random to 57.2%.

03

Teaching effectiveness does not correlate directly with efficiency or knowledge.

Abstract

Social-educational robots designed for socially interactive pedagogical support, such as the Robot Study Companion (RSC), rely on responsive, privacy-preserving interaction despite severely limited compute. However, there is a gap in systematic benchmarking of language models for edge computing in pedagogical applications. This paper benchmarks 25 open-source language models for local deployment on edge hardware. We evaluate each model across three dimensions: inference efficiency (tokens per second, energy consumption), general knowledge (a six-category MMLU subset), and teaching effectiveness (LLM-rated pedagogical quality), validated against five independent human raters using the Raspberry Pi(RPi)4 as the primary platform, with additional comparisons on the RPi5 and a laptop GPU. Results reveal pronounced trade-offs: throughput and energy efficiency vary by over an order of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.