Indic-TunedLens: Interpreting Multilingual Models in Indian Languages
Mihir Panchal, Deeksha Varshney, Mamta, Asif Ekbal

TL;DR
Indic-TunedLens is a new interpretability framework designed for Indian languages in multilingual models, improving understanding of model representations especially for low-resource, morphologically rich languages.
Contribution
It introduces a novel affine transformation approach tailored for Indian languages, enhancing cross-lingual interpretability over existing methods.
Findings
Significantly outperforms SOTA interpretability methods on 10 Indian languages.
Provides insights into semantic encoding in multilingual transformer layers.
Effective for morphologically rich, low-resource languages.
Abstract
Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making cross lingual interpretability a pressing concern. We introduce Indic-TunedLens, a novel interpretability framework specifically for Indian languages that learns shared affine transformations. Unlike the standard Logit Lens, which directly decodes intermediate activations, Indic-TunedLens adjusts hidden states for each target language, aligning them with the target output distributions to enable more faithful decoding of model representations. We evaluate our framework on 10 Indian languages using the MMLU benchmark and find that it significantly improves over SOTA interpretability methods, especially for morphologically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling
