Stratified Knowledge-Density Super-Network for Scalable Vision Transformers

Longhua Li; Lei Qi; Xin Geng

arXiv:2511.11683·cs.LG·November 18, 2025

Stratified Knowledge-Density Super-Network for Scalable Vision Transformers

Longhua Li, Lei Qi, Xin Geng

PDF

Open Access 1 Video

TL;DR

This paper introduces a hierarchical super-network for vision transformers that enables scalable deployment across resource levels by organizing knowledge efficiently and applying novel importance-aware dropout techniques.

Contribution

It proposes a stratified knowledge-density super-network with WPAC for knowledge concentration and PIAD for knowledge stratification, improving scalability and efficiency of ViT models.

Findings

01

WPAC outperforms existing pruning methods in knowledge concentration.

02

PIAD effectively promotes knowledge stratification during training.

03

The combined approach offers a competitive alternative to traditional model compression and expansion.

Abstract

Training and deploying multiple vision transformer (ViT) models for different resource constraints is costly and inefficient. To address this, we propose transforming a pre-trained ViT into a stratified knowledge-density super-network, where knowledge is hierarchically organized across weights. This enables flexible extraction of sub-networks that retain maximal knowledge for varying model sizes. We introduce \textbf{W}eighted \textbf{P}CA for \textbf{A}ttention \textbf{C}ontraction (WPAC), which concentrates knowledge into a compact set of critical weights. WPAC applies token-wise weighted principal component analysis to intermediate features and injects the resulting transformation and inverse matrices into adjacent layers, preserving the original network function while enhancing knowledge compactness. To further promote stratified knowledge organization, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stratified Knowledge-Density Super-Network for Scalable Vision Transformers· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning