Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jieting Long, Zewei Shi, Penghao Jiang, Yidong Gan

TL;DR
The paper introduces Jaeger, a multi-transformer VQA model that combines features from RoBERTa and GPT2-xl through concatenation and dimensionality reduction, aiming to improve efficiency and performance in document-based VQA tasks.
Contribution
Jaeger is a novel concatenation-based multi-transformer VQA model that leverages pre-trained language models for enhanced feature representation and reduced inference time.
Findings
Achieves competitive performance on PDF-VQA Dataset Task C.
Reduces inference time through dimensionality reduction.
Utilizes concatenation of features from RoBERTa and GPT2-xl.
Abstract
Document-based Visual Question Answering poses a challenging task between linguistic sense disambiguation and fine-grained multimodal retrieval. Although there has been encouraging progress in document-based question answering due to the utilization of large language and open-world prior models\cite{1}, several challenges persist, including prolonged response times, extended inference durations, and imprecision in matching. In order to overcome these challenges, we propose Jaegar, a concatenation-based multi-transformer VQA model. To derive question features, we leverage the exceptional capabilities of RoBERTa large\cite{2} and GPT2-xl\cite{3} as feature extractors. Subsequently, we subject the outputs from both models to a concatenation process. This operation allows the model to consider information from diverse sources concurrently, strengthening its representational capability. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Weight Decay · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · WordPiece · Adam
