EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation

Longfei Liu; Yongjie Hou; Yang Li; Qirui Wang; Youyang Sha; Yongjun Yu; Yinzhi Wang; Peizhe Ru; Xuanlong Yu; Xi Shen

arXiv:2603.18739·cs.CV·March 30, 2026

EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation

Longfei Liu, Yongjie Hou, Yang Li, Qirui Wang, Youyang Sha, Yongjun Yu, Yinzhi Wang, Peizhe Ru, Xuanlong Yu, Xi Shen

PDF

2 Repos 12 Models

TL;DR

EdgeCrafter introduces a compact ViT framework with task-specific distillation, enabling efficient and accurate dense prediction on resource-limited edge devices, outperforming traditional CNNs in several tasks.

Contribution

The paper proposes a unified compact ViT framework with task-specific distillation and edge-aware design, significantly improving edge dense prediction performance.

Findings

01

ECDet-S achieves 51.7 AP on COCO with fewer than 10M parameters.

02

ECInsSeg performs comparably to RF-DETR with fewer parameters.

03

ECPose-X reaches 74.8 AP, surpassing YOLO26Pose-X.

Abstract

Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pretraining. We argue that this gap is largely due to insufficient task specific representation learning in small scale ViTs, rather than an inherent mismatch between ViTs and edge dense prediction. To address this issue, we introduce EdgeCrafter, a unified compact ViT framework for edge dense prediction centered on ECDet, a detection model built from a distilled compact backbone and an edge-friendly encoder decoder design. On the COCO dataset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.