Curriculum Script Distillation for Multilingual Visual Question   Answering

Khyathi Raghavi Chandu; Alborz Geramifard

arXiv:2301.07227·cs.CL·January 19, 2023

Curriculum Script Distillation for Multilingual Visual Question Answering

Khyathi Raghavi Chandu, Alborz Geramifard

PDF

Open Access

TL;DR

This paper introduces a curriculum script distillation method to improve multilingual visual question answering by leveraging language script similarities, significantly enhancing performance across languages.

Contribution

It proposes a novel curriculum-based fine-tuning approach that utilizes language script information to boost multilingual VQA performance.

Findings

01

Languages sharing the same script improve performance by ~6%.

02

Mixed-script code-switched languages outperform others by 5-12%.

03

Script plays a vital role in multilingual VQA effectiveness.

Abstract

Pre-trained models with dual and cross encoders have shown remarkable success in propelling the landscape of several tasks in vision and language in Visual Question Answering (VQA). However, since they are limited by the requirements of gold annotated data, most of these advancements do not see the light of day in other languages beyond English. We aim to address this problem by introducing a curriculum based on the source and target language translations to finetune the pre-trained models for the downstream task. Experimental results demonstrate that script plays a vital role in the performance of these models. Specifically, we show that target languages that share the same script perform better (~6%) than other languages and mixed-script code-switched languages perform better than their counterparts (~5-12%).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning