Limited Linguistic Diversity in Embodied AI Datasets

Selma Wanna; Agnes Luhtaru; Jonathan Salfity; Ryan Barron; Juston Moore; Cynthia Matuszek; Mitch Pryor

arXiv:2601.03136·cs.CL·April 29, 2026

Limited Linguistic Diversity in Embodied AI Datasets

Selma Wanna, Agnes Luhtaru, Jonathan Salfity, Ryan Barron, Juston Moore, Cynthia Matuszek, Mitch Pryor

PDF

TL;DR

This paper systematically audits popular Vision-Language-Action datasets, revealing they contain repetitive, template-like instructions with limited linguistic diversity, impacting model training and evaluation.

Contribution

It provides a detailed analysis of the linguistic characteristics of VLA datasets, highlighting their lack of diversity and suggesting improvements for dataset design.

Findings

01

Datasets rely on highly repetitive commands

02

Limited structural variation in instructions

03

Narrow distribution of instruction forms

Abstract

Language plays a critical role in Vision-Language-Action (VLA) models, yet the linguistic characteristics of the datasets used to train and evaluate these systems remain poorly documented. In this work, we present a systematic dataset audit of several widely used VLA corpora, aiming to characterize what kinds of instructions these datasets actually contain and how much linguistic variety they provide. We quantify instruction language along complementary dimensions--including lexical variety, duplication and overlap, semantic similarity, and syntactic complexity. Our analysis shows that many datasets rely on highly repetitive, template-like commands with limited structural variation, yielding a narrow distribution of instruction forms. We position these findings as descriptive documentation of the language signal available in current VLA training and evaluation data, intended to support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.