ASMR: Augmenting Life Scenario using Large Generative Models for Robotic Action Reflection

Shang-Chi Tsai; Seiya Kawano; Angel Garcia Contreras; Koichiro Yoshino; Yun-Nung Chen

arXiv:2506.13956·cs.CL·June 18, 2025

ASMR: Augmenting Life Scenario using Large Generative Models for Robotic Action Reflection

Shang-Chi Tsai, Seiya Kawano, Angel Garcia Contreras, Koichiro Yoshino, Yun-Nung Chen

PDF

Open Access

TL;DR

This paper presents a data augmentation framework using large generative models to improve multimodal classification for robotic assistance, significantly enhancing action prediction accuracy with limited data.

Contribution

The novel framework combines large language models and diffusion-based image generation to augment training data for robotic action reflection tasks.

Findings

01

Achieved state-of-the-art performance on real-world datasets.

02

Enhanced multimodal model accuracy with limited target data.

03

Demonstrated effective use of generative models for data augmentation.

Abstract

When designing robots to assist in everyday human activities, it is crucial to enhance user requests with visual cues from their surroundings for improved intent understanding. This process is defined as a multimodal classification task. However, gathering a large-scale dataset encompassing both visual and linguistic elements for model training is challenging and time-consuming. To address this issue, our paper introduces a novel framework focusing on data augmentation in robotic assistance scenarios, encompassing both dialogues and related environmental imagery. This approach involves leveraging a sophisticated large language model to simulate potential conversations and environmental contexts, followed by the use of a stable diffusion model to create images depicting these environments. The additionally generated data serves to refine the latest multimodal models, enabling them to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems

MethodsDiffusion