Explanation sensitivity to the randomness of large language models: the   case of journalistic text classification

Jeremie Bogaert; Marie-Catherine de Marneffe; Antonin Descampe; Louis; Escouflaire; Cedrick Fairon; Francois-Xavier Standaert

arXiv:2410.05085·cs.CL·October 8, 2024

Explanation sensitivity to the randomness of large language models: the case of journalistic text classification

Jeremie Bogaert, Marie-Catherine de Marneffe, Antonin Descampe, Louis, Escouflaire, Cedrick Fairon, Francois-Xavier Standaert

PDF

Open Access

TL;DR

This paper investigates how randomness in training large language models affects the consistency of their explanations, revealing variability in interpretability despite stable accuracy, and proposes a simpler, more explainable model with potential enhancements.

Contribution

It demonstrates that training randomness impacts explanation stability in LLMs and introduces a simpler, more explainable model that can be improved with explanation-derived features.

Findings

01

Training with different seeds yields similar accuracy but variable explanations.

02

A simpler model offers stable explanations but lower accuracy.

03

Inserting explanation-derived features improves the simpler model.

Abstract

Large language models (LLMs) perform very well in several natural language processing tasks but raise explainability challenges. In this paper, we examine the effect of random elements in the training of LLMs on the explainability of their predictions. We do so on a task of opinionated journalistic text classification in French. Using a fine-tuned CamemBERT model and an explanation method based on relevance propagation, we find that training with different random seeds produces models with similar accuracy but variable explanations. We therefore claim that characterizing the explanations' statistical distribution is needed for the explainability of LLMs. We then explore a simpler model based on textual features which offers stable explanations but is less accurate. Hence, this simpler model corresponds to a different tradeoff between accuracy and explainability. We show that it can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling