Constructing a Portfolio Optimization Benchmark Framework for Evaluating Large Language Models

Hanyong Cho; Jang Ho Kim

arXiv:2603.09301·q-fin.PM·March 11, 2026

Constructing a Portfolio Optimization Benchmark Framework for Evaluating Large Language Models

Hanyong Cho, Jang Ho Kim

PDF

Open Access

TL;DR

This paper develops a benchmark framework to evaluate large language models' ability to solve portfolio optimization problems, testing their reasoning in financial decision-making with explicit solutions.

Contribution

It introduces a novel benchmark for assessing LLMs' optimization reasoning in finance, moving beyond traditional language tasks.

Findings

01

GPT-4 outperforms others in risk-based objectives

02

Gemini 1.5 Pro excels in return-based tasks

03

Llama 3.1-70B shows the lowest overall performance

Abstract

This study introduces a benchmark framework for evaluating the financial decision-making capabilities of large language models (LLMs) through portfolio optimization problems with mathematically explicit solutions. Unlike existing financial benchmarks that emphasize language-processing tasks, the proposed framework directly tests optimization-based reasoning in investment contexts. A large set of multiple-choice questions is generated by varying objectives, candidate assets, and investment constraints, with each problem designed to include a unique correct solution and systematically constructed alternatives. Experimental results comparing GPT-4, Gemini 1.5 Pro, and Llama 3.1-70B reveal distinct performance patterns: GPT achieves the highest accuracy in risk-based objectives and remains stable under constraints, Gemini performs well in return-based tasks but struggles under other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStock Market Forecasting Methods · Explainable Artificial Intelligence (XAI) · Financial Distress and Bankruptcy Prediction