Pedagogical Alignment for Vision-Language-Action Models: A Comprehensive Framework for Data, Architecture, and Evaluation in Education

Unggi Lee; Jahyun Jeong; Sunyoung Shin; Haeun Park; Jeongsu Moon; Youngchang Song; Jaechang Shim; JaeHwan Lee; Yunju Noh; Seungwon Choi; Ahhyun Kim; TaeHyeon Kim; Kyungtae Joo; Taeyeong Kim; Gyeonggeon Lee

arXiv:2601.13876·cs.CL·January 21, 2026

Pedagogical Alignment for Vision-Language-Action Models: A Comprehensive Framework for Data, Architecture, and Evaluation in Education

Unggi Lee, Jahyun Jeong, Sunyoung Shin, Haeun Park, Jeongsu Moon, Youngchang Song, Jaechang Shim, JaeHwan Lee, Yunju Noh, Seungwon Choi, Ahhyun Kim, TaeHyeon Kim, Kyungtae Joo, Taeyeong Kim, Gyeonggeon Lee

PDF

Open Access

TL;DR

This paper introduces a Pedagogical VLA Framework that adapts lightweight vision-language-action models for science education, enhancing safety, pedagogical quality, and explanation generation in resource-constrained settings.

Contribution

It proposes a comprehensive framework combining text healing, LLM distillation, safety training, and pedagogical evaluation to improve VLA models for educational use.

Findings

01

Achieves comparable task performance to baseline models

02

Produces contextually appropriate educational explanations

03

Enhances safety and pedagogical quality in science demonstrations

Abstract

Science demonstrations are important for effective STEM education, yet teachers face challenges in conducting them safely and consistently across multiple occasions, where robotics can be helpful. However, current Vision-Language-Action (VLA) models require substantial computational resources and sacrifice language generation capabilities to maximize efficiency, making them unsuitable for resource-constrained educational settings that require interpretable, explanation-generating systems. We present \textit{Pedagogical VLA Framework}, a framework that applies pedagogical alignment to lightweight VLA models through four components: text healing to restore language generation capabilities, large language model (LLM) distillation to transfer pedagogical knowledge, safety training for educational environments, and pedagogical evaluation adjusted to science education contexts. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Machine Learning in Materials Science