SEC-QA: A Systematic Evaluation Corpus for Financial QA
Viet Dac Lai, Michael Krumdick, Charles Lovering, Varshini Reddy,, Craig Schmidt, Chris Tanner

TL;DR
SEC-QA introduces a novel, continuously updated dataset for financial question answering that reflects real-world scenarios and challenges current models with multi-document, complex reasoning tasks.
Contribution
The paper presents SEC-QA, a semi-automatic, continually refreshed dataset for financial QA and a program-of-thought based QA system to improve complex reasoning.
Findings
Current retrieval-augmented methods struggle with multi-document questions.
The proposed system enhances reasoning and accuracy in financial QA.
SEC-QA better reflects real-world financial information retrieval challenges.
Abstract
The financial domain frequently deals with large numbers of long documents that are essential for daily operations. Significant effort is put towards automating financial data analysis. However, a persistent challenge, not limited to the finance domain, is the scarcity of datasets that accurately reflect real-world tasks for model evaluation. Existing datasets are often constrained by size, context, or relevance to practical applications. Moreover, LLMs are currently trained on trillions of tokens of text, limiting access to novel data or documents that models have not encountered during training for unbiased evaluation. We propose SEC-QA, a continuous dataset generation framework with two key features: 1) the semi-automatic generation of Question-Answer (QA) pairs spanning multiple long context financial documents, which better represent real-world financial scenarios; 2) the ability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods
