ILLUME: Rationalizing Vision-Language Models through Human Interactions

Manuel Brack; Patrick Schramowski; Bj\"orn Deiseroth; Kristian; Kersting

arXiv:2208.08241·cs.LG·June 1, 2023

ILLUME: Rationalizing Vision-Language Models through Human Interactions

Manuel Brack, Patrick Schramowski, Bj\"orn Deiseroth, Kristian, Kersting

PDF

Open Access 1 Repo 1 Datasets

TL;DR

ILLUME is a human-in-the-loop tuning method that improves vision-language models' rationalization abilities by iteratively incorporating human feedback on generated rationales, achieving competitive performance with less data.

Contribution

This paper introduces ILLUME, a novel human interaction-based tuning paradigm that enhances VLMs' rationalization aligned with human intent using minimal feedback.

Findings

01

ILLUME achieves competitive results with fewer training data.

02

The method effectively aligns model rationales with human preferences.

03

Minimal human feedback suffices for significant improvements.

Abstract

Bootstrapping from pre-trained language models has been proven to be an efficient approach for building vision-language models (VLM) for tasks such as image captioning or visual question answering. However, outputs of these models rarely align with user's rationales for specific answers. In order to improve this alignment and reinforce commonsense reasons, we propose a tuning paradigm based on human interactions with machine-generated data. Our ILLUME executes the following loop: Given an image-question-answer prompt, the VLM samples multiple candidate rationales, and a human critic provides feedback via preference selection, used for fine-tuning. This loop increases the training data and gradually carves out the VLM's rationalization capabilities that are aligned with human intent. Our exhaustive experiments demonstrate that ILLUME is competitive with standard supervised finetuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ml-research/ILLUME
pytorchOfficial

Datasets

AIML-TUDA/socio-moral-image-rationales
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsALIGN