Crafting Dynamic Virtual Activities with Advanced Multimodal Models

Changyang Li; Qingan Yan; Minyoung Kim; Zhan Li; Yi Xu; Lap-Fai Yu

arXiv:2406.17582·cs.HC·November 13, 2025

Crafting Dynamic Virtual Activities with Advanced Multimodal Models

Changyang Li, Qingan Yan, Minyoung Kim, Zhan Li, Yi Xu, Lap-Fai Yu

PDF

TL;DR

This paper explores using multimodal large language models to generate realistic, context-aware virtual activities by interpreting virtual environments and orchestrating character interactions for enhanced simulation realism.

Contribution

It introduces a structured framework that leverages MLLMs' multimodal reasoning to generate adaptive, contextually relevant virtual activities with detailed character interactions.

Findings

01

Effective interpretation of scene elements and contexts

02

Accurate positioning and behavior of virtual characters

03

Enhanced realism and contextual relevance in virtual environments

Abstract

In this paper, we investigate the use of multimodal large language models (MLLMs) for generating virtual activities, leveraging the integration of vision-language modalities to enable the interpretation of virtual environments. Our approach recognizes and abstracts key scene elements including scene layouts, semantic contexts, and object identities with MLLMs' multimodal reasoning capabilities. By correlating these abstractions with massive knowledge about human activities, MLLMs are capable of generating adaptive and contextually relevant virtual activities. We propose a structured framework to articulate abstract activity descriptions, emphasizing detailed multi-character interactions within virtual spaces. Utilizing the derived high-level contexts, our approach accurately positions virtual characters and ensures that their interactions and behaviors are realistically and contextually…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.