I've got the "Answer"! Interpretation of LLMs Hidden States in Question Answering
Valeriya Goloviznina, Evgeny Kotelnikov

TL;DR
This study explores how hidden states in large language models relate to their correct or incorrect answers in question answering, aiming to improve interpretability and model performance.
Contribution
It demonstrates that hidden states can distinguish correct from incorrect responses and identifies layers negatively impacting model behavior, proposing targeted training for improvement.
Findings
Hidden states correlate with answer correctness
Certain layers negatively affect model performance
Additional training of weak layers can enhance accuracy
Abstract
Interpretability and explainability of AI are becoming increasingly important in light of the rapid development of large language models (LLMs). This paper investigates the interpretation of LLMs in the context of the knowledge-based question answering. The main hypothesis of the study is that correct and incorrect model behavior can be distinguished at the level of hidden states. The quantized models LLaMA-2-7B-Chat, Mistral-7B, Vicuna-7B and the MuSeRC question-answering dataset are used to test this hypothesis. The results of the analysis support the proposed hypothesis. We also identify the layers which have a negative effect on the model's behavior. As a prospect of practical application of the hypothesis, we propose to train such "weak" layers additionally in order to improve the quality of the task solution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
