Visual Instruction Tuning with Polite Flamingo

Delong Chen; Jianfeng Liu; Wenliang Dai; Baoyuan Wang

arXiv:2307.01003·cs.CV·December 19, 2023

Visual Instruction Tuning with Polite Flamingo

Delong Chen, Jianfeng Liu, Wenliang Dai, Baoyuan Wang

PDF

Open Access 2 Repos

TL;DR

This paper introduces Polite Flamingo, a response rewriter that improves the politeness and formatting of vision-language model annotations, enhancing model responses and human preference.

Contribution

It proposes Polite Flamingo for response rewriting, creates the PF-1M dataset, and develops Clever Flamingo with novel tuning methods to improve multi-modal understanding and response politeness.

Findings

01

Clever Flamingo outperforms baselines in understanding tasks.

02

Polite Flamingo improves response politeness and formatting.

03

PF-1M dataset enhances training quality.

Abstract

Recent research has demonstrated that the multi-task fine-tuning of multi-modal Large Language Models (LLMs) using an assortment of annotated downstream vision-language datasets significantly enhances their performance. Yet, during this process, a side effect, which we termed as the "multi-modal alignment tax", surfaces. This side effect negatively impacts the model's ability to format responses appropriately -- for instance, its "politeness" -- due to the overly succinct and unformatted nature of raw annotations, resulting in reduced human preference. In this paper, we introduce Polite Flamingo, a multi-modal response rewriter that transforms raw annotations into a more appealing, "polite" format. Polite Flamingo is trained to reconstruct high-quality responses from their automatically distorted counterparts and is subsequently applied to a vast array of vision-language datasets for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling