LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,   Generation and Editing

Wei-Ge Chen; Irina Spiridonova; Jianwei Yang; Jianfeng Gao; Chunyuan; Li

arXiv:2311.00571·cs.CV·November 2, 2023·5 cites

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Wei-Ge Chen, Irina Spiridonova, Jianwei Yang, Jianfeng Gao, Chunyuan, Li

PDF

Open Access 2 Repos 9 Models

TL;DR

LLaVA-Interactive is a versatile multimodal human-AI interaction system that supports multi-turn dialogues, visual prompts, and integrates existing AI models for visual chat, segmentation, and editing without additional training.

Contribution

It introduces an all-in-one, cost-efficient multimodal interaction prototype combining multiple pre-trained models for diverse visual and conversational tasks.

Findings

01

Supports multi-turn multimodal dialogues

02

Enables visual prompts for aligning human intents

03

Demonstrates diverse application scenarios

Abstract

LLaVA-Interactive is a research prototype for multimodal human-AI interaction. The system can have multi-turn dialogues with human users by taking multimodal user inputs and generating multimodal responses. Importantly, LLaVA-Interactive goes beyond language prompt, where visual prompt is enabled to align human intents in the interaction. The development of LLaVA-Interactive is extremely cost-efficient as the system combines three multimodal skills of pre-built AI models without additional model training: visual chat of LLaVA, image segmentation from SEEM, as well as image generation and editing from GLIGEN. A diverse set of application scenarios is presented to demonstrate the promises of LLaVA-Interactive and to inspire future research in multimodal interactive systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Topic Modeling

MethodsSparse Evolutionary Training · ALIGN