Argument Quality Annotation and Gender Bias Detection in Financial Communication through Large Language Models
Alaa Alhamzeh, Mays Al Rebdawi

TL;DR
This study evaluates large language models' ability to annotate financial argument quality and detect gender bias, comparing their performance to human annotations and analyzing bias robustness across different settings.
Contribution
It introduces a comprehensive evaluation of LLMs for financial argument annotation and gender bias detection, highlighting their strengths and limitations.
Findings
LLMs outperform humans in annotation consistency.
Models show varying degrees of gender bias.
Annotation stability is influenced by temperature settings.
Abstract
Financial arguments play a critical role in shaping investment decisions and public trust in financial institutions. Nevertheless, assessing their quality remains poorly studied in the literature. In this paper, we examine the capabilities of three state-of-the-art LLMs GPT-4o, Llama 3.1, and Gemma 2 in annotating argument quality within financial communications, using the FinArgQuality dataset. Our contributions are twofold. First, we evaluate the consistency of LLM-generated annotations across multiple runs and benchmark them against human annotations. Second, we introduce an adversarial attack designed to inject gender bias to analyse models responds and ensure model's fairness and robustness. Both experiments are conducted across three temperature settings to assess their influence on annotation stability and alignment with human labels. Our findings reveal that LLM-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Stock Market Forecasting Methods
