MIA 2022 Shared Task Submission: Leveraging Entity Representations, Dense-Sparse Hybrids, and Fusion-in-Decoder for Cross-Lingual Question Answering
Zhucheng Tu, Sarguna Janani Padmanabhan

TL;DR
This paper presents a two-stage cross-lingual question answering system combining hybrid retrieval methods and Fusion-in-Decoder, achieving improved F1 scores on multilingual datasets.
Contribution
It introduces a novel combination of entity-enhanced multilingual models, sparse and dense retrieval, and Fusion-in-Decoder for cross-lingual QA.
Findings
Achieved 43.46 F1 on XOR-TyDi QA development set.
Improved baseline by over 4 F1 points.
Demonstrated effectiveness of hybrid retrieval and Fusion-in-Decoder.
Abstract
We describe our two-stage system for the Multilingual Information Access (MIA) 2022 Shared Task on Cross-Lingual Open-Retrieval Question Answering. The first stage consists of multilingual passage retrieval with a hybrid dense and sparse retrieval strategy. The second stage consists of a reader which outputs the answer from the top passages returned by the first stage. We show the efficacy of using a multilingual language model with entity representations in pretraining, sparse retrieval signals to help dense retrieval, and Fusion-in-Decoder. On the development set, we obtain 43.46 F1 on XOR-TyDi QA and 21.99 F1 on MKQA, for an average F1 score of 32.73. On the test set, we obtain 40.93 F1 on XOR-TyDi QA and 22.29 F1 on MKQA, for an average F1 score of 31.61. We improve over the official baseline by over 4 F1 points on both the development and test sets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsTest
