Simplifying Outcomes of Language Model Component Analyses with ELIA
Aaron Louis Eidt, Nils Feldhus

TL;DR
ELIA is an interactive web tool that simplifies language model analysis results, making them accessible to non-experts through natural language explanations and visualizations, validated by user studies.
Contribution
This work introduces ELIA, a novel system combining multiple analysis techniques with AI-generated explanations to improve interpretability accessibility.
Findings
AI explanations bridge knowledge gaps for non-experts
Interactive visualizations preferred over static ones
System reduces barriers regardless of user experience level
Abstract
While mechanistic interpretability has developed powerful tools to analyze the internal workings of Large Language Models (LLMs), their complexity has created an accessibility gap, limiting their use to specialists. We address this challenge by designing, building, and evaluating ELIA (Explainable Language Interpretability Analysis), an interactive web application that simplifies the outcomes of various language model component analyses for a broader audience. The system integrates three key techniques -- Attribution Analysis, Function Vector Analysis, and Circuit Tracing -- and introduces a novel methodology: using a vision-language model to automatically generate natural language explanations (NLEs) for the complex visualizations produced by these methods. The effectiveness of this approach was empirically validated through a mixed-methods user study, which revealed a clear preference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods
