Team Anotheroption at SemEval-2025 Task 8: Bridging the Gap Between Open-Source and Proprietary LLMs in Table QA

Nikolas Evkarpidi; Elena Tutubalina

arXiv:2506.09657·cs.CL·June 17, 2025

Team Anotheroption at SemEval-2025 Task 8: Bridging the Gap Between Open-Source and Proprietary LLMs in Table QA

Nikolas Evkarpidi, Elena Tutubalina

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a comprehensive system for table question answering that combines multiple LLM-based modules, achieving high accuracy and bridging the performance gap between open-source and proprietary models.

Contribution

It presents an integrated pipeline with text-to-SQL, text-to-code, self-correction, and retrieval-augmented generation modules, demonstrating improved accuracy in table QA tasks.

Findings

01

Achieved 80% accuracy in SemEval 2025 Task 8

02

Top-13 ranking among 38 teams

03

Significant accuracy improvement for open-source models

Abstract

This paper presents a system developed for SemEval 2025 Task 8: Question Answering (QA) over tabular data. Our approach integrates several key components: text-to-SQL and text-to-code generation modules, a self-correction mechanism, and a retrieval-augmented generation (RAG). Additionally, it includes an end-to-end (E2E) module, all orchestrated by a large language model (LLM). Through ablation studies, we analyzed the effects of different parts of our pipeline and identified the challenges that are still present in this field. During the evaluation phase of the competition, our solution achieved an accuracy of 80%, resulting in a top-13 ranking among the 38 participating teams. Our pipeline demonstrates a significant improvement in accuracy for open-source models and achieves a performance comparable to proprietary LLMs in QA tasks over tables. The code is available at GitHub…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Nickolas-option/QA_on_Tabular_Data_SemEval2025_Task8
pytorchOfficial

Videos

Team Anotheroption at SemEval-2025 Task 8: Bridging the Gap Between Open-Source and Proprietary LLMs in Table QA· underline

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies