Efficient Universal Perception Encoder

Chenchen Zhu; Saksham Suri; Cijo Jose; Maxime Oquab; Marc Szafraniec; Wei Wen; Yunyang Xiong; Patrick Labatut; Piotr Bojanowski; Raghuraman Krishnamoorthi; Vikas Chandra

arXiv:2603.22387·cs.CV·April 1, 2026

Efficient Universal Perception Encoder

Chenchen Zhu, Saksham Suri, Cijo Jose, Maxime Oquab, Marc Szafraniec, Wei Wen, Yunyang Xiong, Patrick Labatut, Piotr Bojanowski, Raghuraman Krishnamoorthi, Vikas Chandra

PDF

2 Repos 25 Models

TL;DR

EUPE is a compact, versatile vision encoder that distills knowledge from multiple domain experts, achieving high performance on diverse tasks efficiently suitable for edge devices.

Contribution

Introduces EUPE, a novel distillation approach that scales up to a large proxy teacher before scaling down, improving efficiency and versatility over previous methods.

Findings

01

EUPE matches or exceeds individual domain experts in diverse tasks.

02

EUPE outperforms previous agglomerative encoders in efficiency and performance.

03

Full EUPE models and code are publicly released.

Abstract

Running AI models on smart edge devices can unlock versatile user experiences, but presents challenges due to limited compute and the need to handle multiple tasks simultaneously. This requires a vision encoder with small size but powerful and versatile representations. We present our method, Efficient Universal Perception Encoder (EUPE), which offers both inference efficiency and universally good representations for diverse downstream tasks. We achieve this by distilling from multiple domain-expert foundation vision encoders. Unlike previous agglomerative methods that directly scale down from multiple teachers to an efficient encoder, we demonstrate the importance of first scaling up to a large proxy teacher and then scaling down from this single teacher. Experiments show that EUPE achieves on-par or better performance than individual domain experts of the same size on diverse task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.