HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection
Vadim Vashkelis, Natalia Trukhina

TL;DR
HI-MoE introduces a hierarchical routing architecture for object detection that improves efficiency and accuracy, especially on small objects, by aligning model computation with object-centric structure.
Contribution
The paper proposes a novel two-stage hierarchical routing method for MoE in object detection, better matching the instance-centric nature of the task.
Findings
HI-MoE outperforms dense DINO baseline on COCO.
Significant gains observed on small objects.
Preliminary analysis shows meaningful expert specialization.
Abstract
Mixture-of-Experts (MoE) architectures enable conditional computation by activating only a subset of model parameters for each input. Although sparse routing has been highly effective in language models and has also shown promise in vision, most vision MoE methods operate at the image or patch level. This granularity is poorly aligned with object detection, where the fundamental unit of reasoning is an object query corresponding to a candidate instance. We propose Hierarchical Instance-Conditioned Mixture-of-Experts (HI-MoE), a DETR-style detection architecture that performs routing in two stages: a lightweight scene router first selects a scene-consistent expert subset, and an instance router then assigns each object query to a small number of experts within that subset. This design aims to preserve sparse computation while better matching the heterogeneous, instance-centric structure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
