What Factors Affect LLMs and RLLMs in Financial Question Answering?
Peng Wang, Xuesi Hu, Jiageng Wu, Yuntao Zou, Qiancheng Zhang, Dagang Li

TL;DR
This paper systematically explores how various prompting, framework, and multilingual alignment methods affect the performance of LLMs and RLLMs in financial question answering, revealing inherent differences and limitations.
Contribution
It provides a comprehensive analysis of factors influencing LLM and RLLM performance in finance, highlighting the limitations of current methods for RLLMs and the benefits of prompting in LLMs.
Findings
Prompting and agent frameworks improve LLM performance via Long CoT.
RLLMs have innate Long CoT capabilities, limiting further gains.
Multilingual alignment mainly benefits LLMs' reasoning length, less so RLLMs.
Abstract
Recently, large language models (LLMs) and reasoning large language models (RLLMs) have gained considerable attention from many researchers. RLLMs enhance the reasoning capabilities of LLMs through Long Chain-of-Thought (Long CoT) processes, significantly improving the performance of LLMs in addressing complex problems. However, there are few works that systematically explore what methods can fully unlock the performance of LLMs and RLLMs within the financial domain. To investigate the impact of various methods on LLMs and RLLMs, we utilize five LLMs and four RLLMs to assess the effects of prompting methods, agentic frameworks, and multilingual alignment methods on financial question-answering tasks. Our research findings indicate: (1) Current prompting methods and agent frameworks enhance the performance of LLMs in financial question answering by simulating Long CoT; (2) RLLMs possess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
