Bactrainus: Optimizing Large Language Models for Multi-hop Complex   Question Answering Tasks

Iman Barati; Arash Ghafouri; Behrouz Minaei-Bidgoli

arXiv:2501.06286·cs.CL·January 14, 2025

Bactrainus: Optimizing Large Language Models for Multi-hop Complex Question Answering Tasks

Iman Barati, Arash Ghafouri, Behrouz Minaei-Bidgoli

PDF

TL;DR

This paper evaluates large language models on multi-hop question answering tasks, introducing a two-stage architecture and techniques like Chain of Thought to improve performance, achieving up to 4% F1 score gains.

Contribution

It presents a novel two-stage selector-reader architecture and demonstrates the effectiveness of Chain of Thought and question decomposition in domain-specific multi-hop QA tasks.

Findings

01

Up to 4% improvement in F1 score with proposed methods.

02

Two-stage architecture enhances multi-hop reasoning.

03

Chain of Thought techniques improve answer accuracy.

Abstract

In recent years, the use of large language models (LLMs) has significantly increased, and these models have demonstrated remarkable performance in a variety of general language tasks. However, the evaluation of their performance in domain-specific tasks, particularly those requiring deep natural language understanding, has received less attention. In this research, we evaluate the ability of large language models in performing domain-specific tasks, focusing on the multi-hop question answering (MHQA) problem using the HotpotQA dataset. This task, due to its requirement for reasoning and combining information from multiple textual sources, serves as a challenging benchmark for assessing the language comprehension capabilities of these models. To tackle this problem, we have designed a two-stage selector-reader architecture, where each stage utilizes an independent LLM. In addition,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.