CascadedViT: Cascaded Chunk-FeedForward and Cascaded Group Attention Vision Transformer
Srivathsan Sivakumar, Faisal Z. Qureshi

TL;DR
CascadedViT introduces a lightweight, efficient vision transformer architecture with a novel feedforward network, achieving high accuracy and low energy consumption suitable for resource-limited devices.
Contribution
The paper proposes Cascaded-ViT with a new CCFFN design, improving efficiency without accuracy loss, and introduces the APF metric for compute efficiency evaluation.
Findings
CViT-XL achieves 75.5% Top-1 accuracy on ImageNet-1K.
CViT reduces FLOPs by 15% and energy by 3.3% compared to EfficientViT-M5.
CViT models outperform others in energy efficiency and APF scores.
Abstract
Vision Transformers (ViTs) have demonstrated remarkable performance across a range of computer vision tasks; however, their high computational, memory, and energy demands hinder deployment on resource-constrained platforms. In this paper, we propose \emph{Cascaded-ViT (CViT)}, a lightweight and compute-efficient vision transformer architecture featuring a novel feedforward network design called \emph{Cascaded-Chunk Feed Forward Network (CCFFN)}. By splitting input features, CCFFN improves parameter and FLOP efficiency without sacrificing accuracy. Experiments on ImageNet-1K show that our \emph{CViT-XL} model achieves 75.5\% Top-1 accuracy while reducing FLOPs by 15\% and energy consumption by 3.3\% compared to EfficientViT-M5. Across various model sizes, the CViT family consistently exhibits the lowest energy consumption, making it suitable for deployment on battery-constrained devices…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · CCD and CMOS Imaging Sensors
