COLLAGE: Collaborative Human-Agent Interaction Generation using Hierarchical Latent Diffusion and Language Models
Divyanshu Daiya, Damon Conover, Aniket Bera

TL;DR
COLLAGE is a new framework that combines large language models and hierarchical VQ-VAE-based diffusion models to generate realistic, diverse, and controllable collaborative human-object-human interactions, addressing dataset limitations.
Contribution
It introduces a hierarchical VQ-VAE architecture with a latent diffusion model guided by LLMs for motion generation, enabling multi-resolution, prompt-specific interaction synthesis.
Findings
Outperforms state-of-the-art methods on CORE-4D and InterHuman datasets.
Generates realistic and diverse collaborative interactions.
Provides greater control and diversity in motion generation.
Abstract
We propose a novel framework COLLAGE for generating collaborative agent-object-agent interactions by leveraging large language models (LLMs) and hierarchical motion-specific vector-quantized variational autoencoders (VQ-VAEs). Our model addresses the lack of rich datasets in this domain by incorporating the knowledge and reasoning abilities of LLMs to guide a generative diffusion model. The hierarchical VQ-VAE architecture captures different motion-specific characteristics at multiple levels of abstraction, avoiding redundant concepts and enabling efficient multi-resolution representation. We introduce a diffusion model that operates in the latent space and incorporates LLM-generated motion planning cues to guide the denoising process, resulting in prompt-specific motion generation with greater control and diversity. Experimental results on the CORE-4D, and InterHuman datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsDiffusion · VQ-VAE
