DRESS: Instructing Large Vision-Language Models to Align and Interact   with Humans via Natural Language Feedback

Yangyi Chen; Karan Sikka; Michael Cogswell; Heng Ji; Ajay Divakaran

arXiv:2311.10081·cs.CV·March 20, 2024·1 cites

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback

Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran

PDF

Open Access 1 Datasets

TL;DR

DRESS is a large vision-language model that uses natural language feedback to improve alignment with human preferences and enhance multi-turn interaction capabilities, addressing key limitations of prior models.

Contribution

The paper introduces a novel use of natural language feedback, categorized into critique and refinement, to improve LVLM alignment and multi-turn interaction, trained via generalized reinforcement learning.

Findings

01

Generates more helpful responses (+9.76%)

02

Produces more honest responses (+11.52%)

03

Creates safer responses (+21.03%)

Abstract

We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models to enhance its alignment and interactions by addressing two key limitations in the state-of-the-art LVLMs. First, prior LVLMs generally rely only on the instruction finetuning stage to enhance alignment with human preferences. Without incorporating extra feedback, they are still prone to generate unhelpful, hallucinated, or harmful responses. Second, while the visual instruction tuning data is generally structured in a multi-turn dialogue format, the connections and dependencies among consecutive conversational turns are weak. This reduces the capacity for effective multi-turn interactions. To tackle these, we propose a novel categorization of the NLF into two key types: critique and refinement. The critique NLF identifies the strengths and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

YangyiYY/LVLM_NLF
dataset· 84 dl
84 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsALIGN