Is GPT-OSS All You Need? Benchmarking Large Language Models for Financial Intelligence and the Surprising Efficiency Paradox
Ziqian Bi, Danyang Zhang, Junhao Song, Chiung-Yi Tseng

TL;DR
This paper benchmarks GPT-OSS and other large language models on financial NLP tasks, revealing that smaller GPT-OSS models can match larger models' accuracy while being more efficient, challenging the assumption that bigger models are always better.
Contribution
It introduces a comprehensive evaluation framework and novel efficiency metrics, demonstrating that smaller GPT-OSS models achieve competitive performance with less computational cost.
Findings
Smaller GPT-OSS-20B matches larger models in accuracy.
GPT-OSS models outperform larger competitors in efficiency.
Architectural innovations enable smaller models to be more resource-effective.
Abstract
The rapid adoption of large language models in financial services necessitates rigorous evaluation frameworks to assess their performance, efficiency, and practical applicability. This paper conducts a comprehensive evaluation of the GPT-OSS model family alongside contemporary LLMs across ten diverse financial NLP tasks. Through extensive experimentation on 120B and 20B parameter variants of GPT-OSS, we reveal a counterintuitive finding: the smaller GPT-OSS-20B model achieves comparable accuracy (65.1% vs 66.5%) while demonstrating superior computational efficiency with 198.4 Token Efficiency Score and 159.80 tokens per second processing speed [1]. Our evaluation encompasses sentiment analysis, question answering, and entity recognition tasks using real-world financial datasets including Financial PhraseBank, FiQA-SA, and FLARE FINERORD. We introduce novel efficiency metrics that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Big Data and Digital Economy · Explainable Artificial Intelligence (XAI)
