AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question   Answering

Mahiro Ukai; Shuhei Kurita; Atsushi Hashimoto; Yoshitaka Ushiku,; Nakamasa Inoue

arXiv:2407.19410·cs.AI·July 30, 2024

AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering

Mahiro Ukai, Shuhei Kurita, Atsushi Hashimoto, Yoshitaka Ushiku,, Nakamasa Inoue

PDF

Open Access

TL;DR

AdaCoder is an adaptive prompt compression framework for visual question answering that reduces prompt length by 71.1% without sacrificing performance, using a two-phase approach with a frozen LLM.

Contribution

It introduces a novel adaptive prompt compression method for VPMs that operates without additional training and is compatible with various large language models.

Findings

01

Reduces prompt length by 71.1%

02

Maintains or improves VQA performance

03

Works with multiple black-box LLMs

Abstract

Visual question answering aims to provide responses to natural language questions given visual input. Recently, visual programmatic models (VPMs), which generate executable programs to answer questions through large language models (LLMs), have attracted research interest. However, they often require long input prompts to provide the LLM with sufficient API usage details to generate relevant code. To address this limitation, we propose AdaCoder, an adaptive prompt compression framework for VPMs. AdaCoder operates in two phases: a compression phase and an inference phase. In the compression phase, given a preprompt that describes all API definitions in the Python language with example snippets of code, a set of compressed preprompts is generated, each depending on a specific question type. In the inference phase, given an input question, AdaCoder predicts the question type and chooses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Cosine Annealing · Adam · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Dense Connections