D\'olares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English
Xiao Zhang, Ruoyu Xiang, Chenhan Yuan, Duanyu Feng, Weiguang Han,, Alejandro Lopez-Lira, Xiao-Yang Liu, Sophia Ananiadou, Min Peng, Jimin Huang,, Qianqian Xie

TL;DR
This paper introduces Toisón de Oro, a bilingual framework with datasets, models, and benchmarks for Spanish-English financial NLP, demonstrating that specialized bilingual training improves LLM performance in Spanish finance tasks.
Contribution
The paper presents the first comprehensive bilingual financial NLP framework, including datasets, a finetuned LLM, and an evaluation benchmark, addressing the Spanish-English performance gap.
Findings
FinMA-ES outperforms GPT-4 in Spanish financial tasks.
Bilingual instruction tuning enhances LLM performance in Spanish finance.
Existing LLMs show significant multilingual performance gaps and biases.
Abstract
Despite Spanish's pivotal role in the global finance industry, a pronounced gap exists in Spanish financial natural language processing (NLP) and application studies compared to English, especially in the era of large language models (LLMs). To bridge this gap, we unveil Tois\'on de Oro, the first bilingual framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for financial LLMs in Spanish joint with English. We construct a rigorously curated bilingual instruction dataset including over 144K Spanish and English samples from 15 datasets covering 7 tasks. Harnessing this, we introduce FinMA-ES, an LLM designed for bilingual financial applications. We evaluate our model and existing LLMs using FLARE-ES, the first comprehensive bilingual evaluation benchmark with 21 datasets covering 9 tasks. The FLARE-ES benchmark results reveal a significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLegal Language and Interpretation
MethodsPosition-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Dropout · Multi-Head Attention
