Greenback Bears and Fiscal Hawks: Finance is a Jungle and Text Embeddings Must Adapt
Peter Anderson, Mano Vikash Janardhanan, Jason He, Wei Cheng, Charlie, Flanagan

TL;DR
This paper introduces BAM embeddings, a domain-specific text embedding model trained on a large financial dataset, significantly outperforming general-purpose embeddings in finance-related tasks and enhancing question answering accuracy.
Contribution
The paper presents BAM embeddings, a new finance-specific text embedding model trained on 14.3 million query-passage pairs, with detailed methodology and evaluation benchmarks.
Findings
BAM embeddings achieve 62.8% Recall@1 on finance test set.
BAM improves question answering accuracy by 8% on FinanceBench.
Domain-specific training enhances sensitivity to finance terminology.
Abstract
Financial documents are filled with specialized terminology, arcane jargon, and curious acronyms that pose challenges for general-purpose text embeddings. Yet, few text embeddings specialized for finance have been reported in the literature, perhaps in part due to a lack of public datasets and benchmarks. We present BAM embeddings, a set of text embeddings finetuned on a carefully constructed dataset of 14.3M query-passage pairs. Demonstrating the benefits of domain-specific training, BAM embeddings achieve Recall@1 of 62.8% on a held-out test set, vs. only 39.2% for the best general-purpose text embedding from OpenAI. Further, BAM embeddings increase question answering accuracy by 8% on FinanceBench and show increased sensitivity to the finance-specific elements that are found in detailed, forward-looking and company and date-specific queries. To support further research we describe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic, financial, and policy analysis
MethodsBottleneck Attention Module · Sparse Evolutionary Training
