I've got the "Answer"! Interpretation of LLMs Hidden States in Question   Answering

Valeriya Goloviznina; Evgeny Kotelnikov

arXiv:2406.02060·cs.CL·June 5, 2024

I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering

Valeriya Goloviznina, Evgeny Kotelnikov

PDF

Open Access

TL;DR

This study explores how hidden states in large language models relate to their correct or incorrect answers in question answering, aiming to improve interpretability and model performance.

Contribution

It demonstrates that hidden states can distinguish correct from incorrect responses and identifies layers negatively impacting model behavior, proposing targeted training for improvement.

Findings

01

Hidden states correlate with answer correctness

02

Certain layers negatively affect model performance

03

Additional training of weak layers can enhance accuracy

Abstract

Interpretability and explainability of AI are becoming increasingly important in light of the rapid development of large language models (LLMs). This paper investigates the interpretation of LLMs in the context of the knowledge-based question answering. The main hypothesis of the study is that correct and incorrect model behavior can be distinguished at the level of hidden states. The quantized models LLaMA-2-7B-Chat, Mistral-7B, Vicuna-7B and the MuSeRC question-answering dataset are used to test this hypothesis. The results of the analysis support the proposed hypothesis. We also identify the layers which have a negative effect on the model's behavior. As a prospect of practical application of the hypothesis, we propose to train such "weak" layers additionally in order to improve the quality of the task solution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling