Efficacy of Large Language Models in Systematic Reviews
Aaditya Shah, Shridhar Mehendale, Siddha Kanthi

TL;DR
This paper evaluates the effectiveness of large language models in conducting systematic reviews of ESG literature, demonstrating that fine-tuned models can significantly outperform standard LLMs in accuracy, aiding faster decision-making for investors.
Contribution
It introduces a novel application of fine-tuned LLMs for systematic literature review tasks, showing improved accuracy over existing models in ESG research interpretation.
Findings
Fine-tuned GPT-4o Mini outperforms base LLMs by 28.3% in accuracy.
Custom GPT improves accuracy by 3.0% and 15.7% on different prompts.
LLMs can effectively summarize complex ESG evidence for investment decisions.
Abstract
This study investigates the effectiveness of Large Language Models (LLMs) in interpreting existing literature through a systematic review of the relationship between Environmental, Social, and Governance (ESG) factors and financial performance. The primary objective is to assess how LLMs can replicate a systematic review on a corpus of ESG-focused papers. We compiled and hand-coded a database of 88 relevant papers published from March 2020 to May 2024. Additionally, we used a set of 238 papers from a previous systematic review of ESG literature from January 2015 to February 2020. We evaluated two current state-of-the-art LLMs, Meta AI's Llama 3 8B and OpenAI's GPT-4o, on the accuracy of their interpretations relative to human-made classifications on both sets of papers. We then compared these results to a "Custom GPT" and a fine-tuned GPT-4o Mini model using the corpus of 238 papers as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling
MethodsLLaMA · Balanced Selection · Sparse Evolutionary Training
