Evaluating the Accuracy of Chatbots in Financial Literature

Orhan Erdem; Kristi Hassett; Feyzullah Egriboyun

arXiv:2411.07031·cs.AI·February 18, 2025

Evaluating the Accuracy of Chatbots in Financial Literature

Orhan Erdem, Kristi Hassett, Feyzullah Egriboyun

PDF

Open Access

TL;DR

This study assesses the accuracy of ChatGPT and Gemini Advanced chatbots in citing financial literature, revealing significant hallucination rates and the influence of recency on their reliability.

Contribution

Introduces a nonbinary and recency-based methodology to evaluate chatbot hallucinations in financial literature citations, providing new insights into their reliability.

Findings

01

ChatGPT-4o hallucination rate: 20.0%

02

Gemini Advanced hallucination rate: 76.7%

03

Hallucination rates increase with topic recency, but not significantly for Gemini Advanced

Abstract

We evaluate the reliability of two chatbots, ChatGPT (4o and o1-preview versions), and Gemini Advanced, in providing references on financial literature and employing novel methodologies. Alongside the conventional binary approach commonly used in the literature, we developed a nonbinary approach and a recency measure to assess how hallucination rates vary with how recent a topic is. After analyzing 150 citations, ChatGPT-4o had a hallucination rate of 20.0% (95% CI, 13.6%-26.4%), while the o1-preview had a hallucination rate of 21.3% (95% CI, 14.8%-27.9%). In contrast, Gemini Advanced exhibited higher hallucination rates: 76.7% (95% CI, 69.9%-83.4%). While hallucination rates increased for more recent topics, this trend was not statistically significant for Gemini Advanced. These findings emphasize the importance of verifying chatbot-provided references, particularly in rapidly evolving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinTech, Crowdfunding, Digital Finance · Stock Market Forecasting Methods