Is GPT-OSS All You Need? Benchmarking Large Language Models for Financial Intelligence and the Surprising Efficiency Paradox

Ziqian Bi; Danyang Zhang; Junhao Song; Chiung-Yi Tseng

arXiv:2512.14717·cs.LG·December 18, 2025

Is GPT-OSS All You Need? Benchmarking Large Language Models for Financial Intelligence and the Surprising Efficiency Paradox

Ziqian Bi, Danyang Zhang, Junhao Song, Chiung-Yi Tseng

PDF

Open Access

TL;DR

This paper benchmarks GPT-OSS and other large language models on financial NLP tasks, revealing that smaller GPT-OSS models can match larger models' accuracy while being more efficient, challenging the assumption that bigger models are always better.

Contribution

It introduces a comprehensive evaluation framework and novel efficiency metrics, demonstrating that smaller GPT-OSS models achieve competitive performance with less computational cost.

Findings

01

Smaller GPT-OSS-20B matches larger models in accuracy.

02

GPT-OSS models outperform larger competitors in efficiency.

03

Architectural innovations enable smaller models to be more resource-effective.

Abstract

The rapid adoption of large language models in financial services necessitates rigorous evaluation frameworks to assess their performance, efficiency, and practical applicability. This paper conducts a comprehensive evaluation of the GPT-OSS model family alongside contemporary LLMs across ten diverse financial NLP tasks. Through extensive experimentation on 120B and 20B parameter variants of GPT-OSS, we reveal a counterintuitive finding: the smaller GPT-OSS-20B model achieves comparable accuracy (65.1% vs 66.5%) while demonstrating superior computational efficiency with 198.4 Token Efficiency Score and 159.80 tokens per second processing speed [1]. Our evaluation encompasses sentiment analysis, question answering, and entity recognition tasks using real-world financial datasets including Financial PhraseBank, FiQA-SA, and FLARE FINERORD. We introduce novel efficiency metrics that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · Big Data and Digital Economy · Explainable Artificial Intelligence (XAI)