E^2VTS: Energy-Efficient Video Text Spotting from Unmanned Aerial Vehicles
Zhenyu Hu, Zhenyu Wu, Pengcheng Pi, Yunhe Xue, Jiayi Shen, Jianchao, Tan, Xiangru Lian, Zhangyang Wang, and Ji Liu

TL;DR
E^2VTS is an energy-efficient video text spotting system for UAVs that combines optimized training, multi-stage image processing, and model pruning to balance performance and energy consumption.
Contribution
The paper introduces a novel energy-efficient video text spotting approach tailored for UAVs, including a new training strategy, multi-stage processing, and deployment optimizations.
Findings
Outperforms previous methods in energy efficiency and accuracy.
Effective pruning and quantization enable deployment on Raspberry Pi.
Empirical validation on UAV-captured video datasets.
Abstract
Unmanned Aerial Vehicles (UAVs) based video text spotting has been extensively used in civil and military domains. UAV's limited battery capacity motivates us to develop an energy-efficient video text spotting solution. In this paper, we first revisit RCNN's crop & resize training strategy and empirically find that it outperforms aligned RoI sampling on a real-world video text dataset captured by UAV. To reduce energy consumption, we further propose a multi-stage image processor that takes videos' redundancy, continuity, and mixed degradation into account. Lastly, the model is pruned and quantized before deployed on Raspberry Pi. Our proposed energy-efficient video text spotting solution, dubbed as E^2VTS, outperforms all previous methods by achieving a competitive tradeoff between energy efficiency and performance. All our codes and pre-trained models are available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Face recognition and analysis
