TL;DR
EdgeCrafter introduces a compact ViT framework with task-specific distillation, enabling efficient and accurate dense prediction on resource-limited edge devices, outperforming traditional CNNs in several tasks.
Contribution
The paper proposes a unified compact ViT framework with task-specific distillation and edge-aware design, significantly improving edge dense prediction performance.
Findings
ECDet-S achieves 51.7 AP on COCO with fewer than 10M parameters.
ECInsSeg performs comparably to RF-DETR with fewer parameters.
ECPose-X reaches 74.8 AP, surpassing YOLO26Pose-X.
Abstract
Deploying high-performance dense prediction models on resource-constrained edge devices remains challenging due to strict limits on computation and memory. In practice, lightweight systems for object detection, instance segmentation, and pose estimation are still dominated by CNN-based architectures such as YOLO, while compact Vision Transformers (ViTs) often struggle to achieve similarly strong accuracy efficiency tradeoff, even with large scale pretraining. We argue that this gap is largely due to insufficient task specific representation learning in small scale ViTs, rather than an inherent mismatch between ViTs and edge dense prediction. To address this issue, we introduce EdgeCrafter, a unified compact ViT framework for edge dense prediction centered on ECDet, a detection model built from a distilled compact backbone and an edge-friendly encoder decoder design. On the COCO dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Intellindust/ECDet_Smodel· 27 dl· ♡ 227 dl♡ 2
- 🤗Intellindust/ECDet_Mmodel· 16 dl· ♡ 216 dl♡ 2
- 🤗Intellindust/ECDet_Lmodel· 30 dl· ♡ 330 dl♡ 3
- 🤗Intellindust/ECDet_Xmodel· 34 dl· ♡ 234 dl♡ 2
- 🤗Intellindust/ECSeg_Smodel· 12 dl· ♡ 212 dl♡ 2
- 🤗Intellindust/ECSeg_Mmodel· 13 dl· ♡ 213 dl♡ 2
- 🤗Intellindust/ECSeg_Lmodel· 19 dl· ♡ 219 dl♡ 2
- 🤗Intellindust/ECSeg_Xmodel· 18 dl· ♡ 218 dl♡ 2
- 🤗Intellindust/ECPose_Smodel· 16 dl· ♡ 216 dl♡ 2
- 🤗Intellindust/ECPose_Mmodel· 15 dl· ♡ 215 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
