TL;DR
This paper introduces DirMixE, a hierarchical mixture-of-experts approach for test-agnostic long-tail recognition, effectively capturing global and local label distribution variations to improve model robustness and performance.
Contribution
We propose DirMixE, a novel MoE strategy using Dirichlet meta-distributions to model local and global variations, along with a Latent Skill Finetuning framework for efficient foundation model adaptation.
Findings
DirMixE outperforms existing methods on long-tail datasets.
The hierarchical Dirichlet approach captures diverse test distributions.
Theoretical bounds support the effectiveness of variance regularization.
Abstract
This paper explores test-agnostic long-tail recognition, a challenging long-tail task where the test label distributions are unknown and arbitrarily imbalanced. We argue that the variation in these distributions can be broken down hierarchically into global and local levels. The global ones reflect a broad range of diversity, while the local ones typically arise from milder changes, often focused on a particular neighbor. Traditional methods predominantly use a Mixture-of-Expert (MoE) approach, targeting a few fixed test label distributions that exhibit substantial global variations. However, the local variations are left unconsidered. To address this issue, we propose a new MoE strategy, DirMixE, which assigns experts to different Dirichlet meta-distributions of the label distribution, each targeting a specific aspect of local variations. Additionally, the diversity among these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixture of Experts
