MACAROON: Training Vision-Language Models To Be Your Engaged Partners

Shujin Wu; Yi R. Fung; Sha Li; Yixin Wan; Kai-Wei Chang; Heng Ji

arXiv:2406.14137·cs.CL·October 21, 2024

MACAROON: Training Vision-Language Models To Be Your Engaged Partners

Shujin Wu, Yi R. Fung, Sha Li, Yixin Wan, Kai-Wei Chang, Heng Ji

PDF

Open Access 1 Repo

TL;DR

This paper introduces MACAROON, a training method that enhances vision-language models to proactively engage with users by asking clarifying questions, significantly improving their engagement capabilities without sacrificing general performance.

Contribution

The study develops a hierarchical question framework, creates the PIE evaluation dataset, and proposes MACAROON, a novel training approach that boosts LVLMs' proactive engagement abilities.

Findings

01

Existing LVLMs perform poorly in proactive engagement (AAR 0.28).

02

MACAROON improves engagement performance to 0.84 AAR.

03

The method maintains comparable general task performance.

Abstract

Large vision-language models (LVLMs), while proficient in following instructions and responding to diverse questions, invariably generate detailed responses even when questions are ambiguous or unanswerable, leading to hallucinations and bias issues. Thus, it is essential for LVLMs to proactively engage with humans to ask for clarifications or additional information for better responses. In this study, we aim to shift LVLMs from passive answer providers to proactive engaged partners. We begin by establishing a three-tiered hierarchy for questions of invalid, ambiguous, and personalizable nature to measure the proactive engagement capabilities of LVLMs. Utilizing this hierarchy, we create PIE, (ProactIve Engagement Evaluation) through GPT-4o and human annotators, consisting of 853 questions across six distinct, fine-grained question types that are verified by human annotators and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shujinwu-0814/macaroon
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOrganizational Strategy and Culture

MethodsALIGN