Anchor-MoE: A Mean-Anchored Mixture of Experts For Probabilistic Regression

Baozhuo Su; Zhengxian Qu

arXiv:2508.16802·cs.LG·August 26, 2025

Anchor-MoE: A Mean-Anchored Mixture of Experts For Probabilistic Regression

Baozhuo Su, Zhengxian Qu

PDF

TL;DR

Anchor-MoE introduces a flexible mixture of experts model for probabilistic regression, combining an anchor mean with local expert dispatching to achieve minimax-optimal risk rates and state-of-the-art empirical performance.

Contribution

The paper proposes Anchor-MoE, a novel mixture of experts framework with theoretical risk guarantees and superior empirical results in probabilistic regression.

Findings

01

Achieves minimax-optimal $L^2$ risk rate $O(N^{-2\alpha/(2\alpha+d)})$.

02

Scales favorably with number of experts and latent dimensions in CRPS and NLL.

03

Outperforms strong baselines on standard UCI regression benchmarks.

Abstract

Regression under uncertainty is fundamental across science and engineering. We present an Anchored Mixture of Experts (Anchor-MoE), a model that handles both probabilistic and point regression. For simplicity, we use a tuned gradient-boosting model to furnish the anchor mean; however, any off-the-shelf point regressor can serve as the anchor. The anchor prediction is projected into a latent space, where a learnable metric-window kernel scores locality and a soft router dispatches each sample to a small set of mixture-density-network experts; the experts produce a heteroscedastic correction and predictive variance. We train by minimizing negative log-likelihood, and on a disjoint calibration split fit a post-hoc linear map on predicted means to improve point accuracy. On the theory side, assuming a H\"older smooth regression function of order~ $α$ and fixed Lipschitz…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.