On Evaluating the Integration of Reasoning and Action in LLM Agents with   Database Question Answering

Linyong Nan; Ellen Zhang; Weijin Zou; Yilun Zhao; Wenfei Zhou; Arman; Cohan

arXiv:2311.09721·cs.CL·November 17, 2023·1 cites

On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Linyong Nan, Ellen Zhang, Weijin Zou, Yilun Zhao, Wenfei Zhou, Arman, Cohan

PDF

Open Access

TL;DR

This paper presents a new dataset and evaluation framework for assessing how well Large Language Models can interact with SQL interpreters to answer complex database questions, revealing key challenges and proposing solutions.

Contribution

It introduces a novel long-form database question answering dataset, analyzes interaction bottlenecks, and develops a multi-agent peer-review evaluation framework for LLMs.

Findings

01

GPT-4 struggles with the task despite advanced capabilities.

02

Planning and multi-query generation are primary bottlenecks.

03

Multi-agent evaluation improves answer quality assessment.

Abstract

This study introduces a new long-form database question answering dataset designed to evaluate how Large Language Models (LLMs) interact with a SQL interpreter. The task necessitates LLMs to strategically generate multiple SQL queries to retrieve sufficient data from a database, to reason with the acquired context, and to synthesize them into a comprehensive analytical narrative. Our findings highlight that this task poses great challenges even for the state-of-the-art GPT-4 model. We propose and evaluate two interaction strategies, and provide a fine-grained analysis of the individual stages within the interaction. A key discovery is the identification of two primary bottlenecks hindering effective interaction: the capacity for planning and the ability to generate multiple SQL queries. To address the challenge of accurately assessing answer quality, we introduce a multi-agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems

MethodsMulti-Head Attention · Attention Is All You Need · Adam · Softmax · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Residual Connection