SigLino: Efficient Multi-Teacher Distillation for Agglomerative Vision Foundation Models

Sofian Chaybouti; Sanath Narayan; Yasser Dahou; Ph\'uc H. L\^e Khac; Ankit Singh; Ngoc Dung Huynh; Wamiq Reyaz Para; Hilde Kuehne; Hakim Hacid

arXiv:2512.20157·cs.CV·April 8, 2026

SigLino: Efficient Multi-Teacher Distillation for Agglomerative Vision Foundation Models

Sofian Chaybouti, Sanath Narayan, Yasser Dahou, Ph\'uc H. L\^e Khac, Ankit Singh, Ngoc Dung Huynh, Wamiq Reyaz Para, Hilde Kuehne, Hakim Hacid

PDF

1 Repo 5 Models

TL;DR

SigLino introduces an efficient multi-teacher distillation method for vision models, leveraging novel loss functions, data sampling strategies, and a large curated dataset to improve transferability and efficiency.

Contribution

The paper presents SigLino, a new agglomerative vision foundation model framework that enhances multi-teacher distillation with innovative techniques and releases a large, efficient training dataset.

Findings

01

SigLino achieves effective knowledge transfer with a novel asymmetric relation-knowledge distillation loss.

02

Token-balanced batching stabilizes training across varying image resolutions.

03

Hierarchical data sampling improves sample efficiency over random sampling.

Abstract

Vision foundation models trained via multi-teacher distillation offer a promising path toward unified visual representations, yet the learning dynamics and data efficiency of such approaches remain underexplored. In this paper, we systematically study multi-teacher distillation for vision foundation models and identify key factors that enable training at lower computational cost. We introduce SigLino, an efficient family of agglomerative vision foundation models that distill knowledge from SigLIP2 and DINOv3 simultaneously into Dense and Mixture-of-Experts students. We show that (1) our Asymmetric Relation-Knowledge Distillation loss preserves the geometric properties of each teacher while enabling effective knowledge transfer, (2) token-balanced batching that packs varying-resolution images into sequences with uniform token budgets stabilizes representation learning across resolutions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tiiuae/amoe
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.