CascadedViT: Cascaded Chunk-FeedForward and Cascaded Group Attention Vision Transformer

Srivathsan Sivakumar; Faisal Z. Qureshi

arXiv:2511.14111·cs.CV·November 25, 2025

CascadedViT: Cascaded Chunk-FeedForward and Cascaded Group Attention Vision Transformer

Srivathsan Sivakumar, Faisal Z. Qureshi

PDF

Open Access

TL;DR

CascadedViT introduces a lightweight, efficient vision transformer architecture with a novel feedforward network, achieving high accuracy and low energy consumption suitable for resource-limited devices.

Contribution

The paper proposes Cascaded-ViT with a new CCFFN design, improving efficiency without accuracy loss, and introduces the APF metric for compute efficiency evaluation.

Findings

01

CViT-XL achieves 75.5% Top-1 accuracy on ImageNet-1K.

02

CViT reduces FLOPs by 15% and energy by 3.3% compared to EfficientViT-M5.

03

CViT models outperform others in energy efficiency and APF scores.

Abstract

Vision Transformers (ViTs) have demonstrated remarkable performance across a range of computer vision tasks; however, their high computational, memory, and energy demands hinder deployment on resource-constrained platforms. In this paper, we propose \emph{Cascaded-ViT (CViT)}, a lightweight and compute-efficient vision transformer architecture featuring a novel feedforward network design called \emph{Cascaded-Chunk Feed Forward Network (CCFFN)}. By splitting input features, CCFFN improves parameter and FLOP efficiency without sacrificing accuracy. Experiments on ImageNet-1K show that our \emph{CViT-XL} model achieves 75.5\% Top-1 accuracy while reducing FLOPs by 15\% and energy consumption by 3.3\% compared to EfficientViT-M5. Across various model sizes, the CViT family consistently exhibits the lowest energy consumption, making it suitable for deployment on battery-constrained devices…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · CCD and CMOS Imaging Sensors