Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language   Tasks

Fawaz Sammani; Nikos Deligiannis

arXiv:2308.09033·cs.CV·September 20, 2023

Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks

Fawaz Sammani, Nikos Deligiannis

PDF

Open Access 2 Repos

TL;DR

Uni-NLX is a unified multi-task model that leverages large language models to generate natural language explanations across vision and vision-language tasks, reducing parameters while maintaining or improving performance.

Contribution

It introduces a unified framework for all NLE tasks, along with two new datasets, enabling multi-task learning with fewer parameters and comparable or better results.

Findings

01

Capable of performing 7 NLE tasks simultaneously

02

Uses 7X fewer parameters than task-specific models

03

Achieves comparable or superior performance on several tasks

Abstract

Natural Language Explanations (NLE) aim at supplementing the prediction of a model with human-friendly natural text. Existing NLE approaches involve training separate models for each downstream task. In this work, we propose Uni-NLX, a unified framework that consolidates all NLE tasks into a single and compact multi-task model using a unified training objective of text generation. Additionally, we introduce two new NLE datasets: 1) ImageNetX, a dataset of 144K samples for explaining ImageNet categories, and 2) VQA-ParaX, a dataset of 123K samples for explaining the task of Visual Question Answering (VQA). Both datasets are derived leveraging large language models (LLMs). By training on the 1M combined NLE samples, our single unified framework is capable of simultaneously performing seven NLE tasks including VQA, visual recognition and visual reasoning tasks with 7X fewer parameters,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)