Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

Gene Tangtartharakul; Katherine R. Storrs

arXiv:2605.20610·cs.CV·May 21, 2026

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

Gene Tangtartharakul, Katherine R. Storrs

PDF

TL;DR

This paper investigates how vision Mixture-of-Experts models encode information, revealing that expert specialization extends beyond simple category routing to broader visual and semantic features, with stable and interpretable patterns.

Contribution

The study introduces expert-level analyses using neuroscience-inspired tools, demonstrating that expert tuning involves continuous features and is stable across initializations, beyond traditional routing interpretations.

Findings

01

Expert partitioning is dominated by animate-inanimate distinction.

02

Experts show broader tuning to continuous visual and semantic features.

03

Expert specialization is stable across different initializations.

Abstract

Mixture-of-Experts (MoE) models are often interpreted by analysing which categories are routed to which experts. However, routing alone does not reveal what each expert actually encodes. We train sparsely-gated convolutional MoE models with a contrastive objective on natural images and characterise expert specialisation using tools from visual neuroscience. Extending from gating-level to expert-level analyses, we measure per-expert category separability, and per-expert tuning using the most exciting inputs. Extending from category-level to feature-level explanations, we interpret tuning via semantic dimensions derived from a dataset of human behavioural judgements (THINGS). Finally, we use tuning and representational similarity analysis to assess the stability of expertise-allocation across independent initialisations. We find that an animate-inanimate distinction dominates expert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.