Understanding Expert Structures on Minimax Parameter Estimation in   Contaminated Mixture of Experts

Fanqi Yan; Huy Nguyen; Dung Le; Pedram Akbarian; Nhat Ho

arXiv:2410.12258·cs.LG·March 7, 2025

Understanding Expert Structures on Minimax Parameter Estimation in Contaminated Mixture of Experts

Fanqi Yan, Huy Nguyen, Dung Le, Pedram Akbarian, Nhat Ho

PDF

Open Access

TL;DR

This paper analyzes the convergence of parameter estimation in contaminated mixture of experts models, addressing challenges like prompt vanishing and parameter interaction, and provides theoretical and empirical insights into expert structure effects.

Contribution

It introduces a distinguishability condition and investigates various expert structures, offering convergence rates and minimax bounds for parameter estimation.

Findings

01

Convergence rates are established for different expert structures.

02

A distinguishability condition helps control parameter interactions.

03

Numerical experiments support theoretical results.

Abstract

We conduct the convergence analysis of parameter estimation in the contaminated mixture of experts. This model is motivated from the prompt learning problem where ones utilize prompts, which can be formulated as experts, to fine-tune a large-scale pre-trained model for learning downstream tasks. There are two fundamental challenges emerging from the analysis: (i) the proportion in the mixture of the pre-trained model and the prompt may converge to zero during the training, leading to the prompt vanishing issue; (ii) the algebraic interaction among parameters of the pre-trained model and the prompt can occur via some partial differential equations and decelerate the prompt learning. In response, we introduce a distinguishability condition to control the previous parameter interaction. Additionally, we also investigate various types of expert structure to understand their effects on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Process Monitoring