Bactrainus: Optimizing Large Language Models for Multi-hop Complex Question Answering Tasks
Iman Barati, Arash Ghafouri, Behrouz Minaei-Bidgoli

TL;DR
This paper evaluates large language models on multi-hop question answering tasks, introducing a two-stage architecture and techniques like Chain of Thought to improve performance, achieving up to 4% F1 score gains.
Contribution
It presents a novel two-stage selector-reader architecture and demonstrates the effectiveness of Chain of Thought and question decomposition in domain-specific multi-hop QA tasks.
Findings
Up to 4% improvement in F1 score with proposed methods.
Two-stage architecture enhances multi-hop reasoning.
Chain of Thought techniques improve answer accuracy.
Abstract
In recent years, the use of large language models (LLMs) has significantly increased, and these models have demonstrated remarkable performance in a variety of general language tasks. However, the evaluation of their performance in domain-specific tasks, particularly those requiring deep natural language understanding, has received less attention. In this research, we evaluate the ability of large language models in performing domain-specific tasks, focusing on the multi-hop question answering (MHQA) problem using the HotpotQA dataset. This task, due to its requirement for reasoning and combining information from multiple textual sources, serves as a challenging benchmark for assessing the language comprehension capabilities of these models. To tackle this problem, we have designed a two-stage selector-reader architecture, where each stage utilizes an independent LLM. In addition,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
